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Abstract 

A  transmitter  antenna  array  has  the  ability  to  direct  data  simultaneously  to  multiple 
receivers  within  a  wireless  network,  creating  potential  for  a  more  integrated  view  of 
algorithmic  system  components.  In  this  thesis,  such  a  perspective  informs  the  design 
of  two  system  tasks:  the  scheduling  of  packets  from  a  number  of  data  streams  into 
groups;  and  the  subsequent  spatial  multiplexing  and  encoding  of  these  groups  using 
array  processing.  We  demonstrate  how  good  system  designs  can  help  these  two  tasks 
reinforce  one  another,  or  alternatively  enable  tradeoffs  in  complexity  between  the  two. 
Moreover,  scheduling  and  array  processing  each  beneht  from  a  further  awareness  of 
both  the  fading  channel  state  and  certain  properties  of  the  data,  providing  information 
about  key  flexibilities,  constraints  and  goals. 

Our  development  focuses  on  techniques  that  lead  to  high  performance  even  with 
very  low-complexity  receivers.  We  hrst  consider  spatial  precoding  under  simple 
scheduling  and  propose  several  extensions  for  implementation,  such  as  a  unihed  time- 
domain  precoder  that  compensates  for  both  cross-channel  and  intersymbol  interfer¬ 
ence.  We  then  show  how  more  sophisticated,  channel-aware  scheduling  can  reduce  the 
complexity  requirements  of  the  array  processing.  The  scheduling  algorithms  presented 
are  based  on  the  receivers’  fading  channel  realizations  and  the  delay  tolerances  of  the 
data  streams.  Finally,  we  address  the  multicasting  of  common  data  streams  in  terms 
of  opportunities  for  reduced  redundancy  as  well  as  the  conflicting  objectives  inherent 
in  sending  to  multiple  receivers.  Our  channel-aware  extensions  of  space-time  codes  for 
multicasting  gain  several  dB  over  traditional  versions  that  do  not  incorporate  channel 
knowledge. 

Thesis  Supervisor:  Gregory  W.  Wornell 
Title:  Professor  of  Electrical  Engineering 
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List  of  Commonly  Used 
Mathematical  Symbols 


The  following  symbols  come  up  often  throughout  the  thesis,  and  care  has  been  taken 
to  dehne  them  consistently.  Other  symbols  come  up  less  often,  some  of  which  may 
occasionally  be  used  for  more  than  one  purpose. 


H,  h  Channel  matrix,  channel  vector 
K  Number  of  receivers 

M  Number  of  transmit  antenna  alements 

s  Input  data  stream  symbols 

X  Outputs  of  transmit  antenna  elements 

w  Additive  noise  at  receivers 

y  Received  symbols 

V  Maximum  transmit  power 

A/q  Noise  power  (at  each  receiver) 

£  Expectation  operator 

1  Conjugate  transpose  operator 

Conjugate  transpose  and  time  reversal  operator 
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Chapter  1 


Introduction 


Wireless  communication  has  been  expanding  at  an  impressive  rate  for  some  time 
now.  Yet  in  this  climate,  engineers  still  struggle  with  fundamental  questions  about 
network  architecture  and  the  underlying  physical  limitations  of  communicating  over 
airwaves.  This  thesis  hopes  to  contribute  to  this  discussion  with  improvements  in  the 
understanding  and  design  of  antenna  array  systems  to  address  these  issues. 

Many  wireless  network  architectures  are  amenable  to  the  limited  use  of  arrays. 
In  cellular  systems,  mobile  devices  are  divided  among  geographic  cells  and  only  com¬ 
municate  directly  with  a  base  station  associated  with  their  current  cell.  Wireless 
ad-hoc  networks  do  not  have  such  central  control;  a  local  set  of  devices  is  able  to  self- 
conhgure.  Many  times,  however,  it  is  still  useful  to  route  communications  through 
a  single  node  that  has  internetwork  connectivity  and  lack  of  battery-life  constraints. 
In  these  and  other  examples,  users  are  divided  into  relatively  simple,  inexpensive  de¬ 
vices  and  a  smaller  number  of  more  powerful  nodes.  The  latter  type,  with  their  less 
stringent  constraints  on  power,  size,  and  computation,  become  natural  candidates  for 
the  use  of  a  multiple-element  array. 

In  this  thesis,  we  consider  such  a  model  and  focus  on  the  interactions  between 
a  single  array  device  and  its  associated  wireless  users.  Furthermore,  we  concentrate 
on  the  less-understood  “downstream”  direction  (that  is,  from  the  base  station  to¬ 
ward  the  various  receivers).  Since  the  receivers  are  battery-limited  and  typically  do 
not  have  a  great  amount  of  coordination,  responsibility  for  ensuring  high  rates  and 
avoiding  interference  falls  mainly  on  the  transmitter  and  is  the  main  subject  of  our 
research.  Global  issues  such  as  handoff  among  base  stations  are  important,  but  will 
be  considered  beyond  the  scope  of  this  thesis.  We  will  see  that  the  single-array  con- 
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Figure  1-1:  Block  diagram  of  transmitter  with  an  antenna  array,  illustrating  schedul¬ 
ing  and  array  processing  system  tasks. 


hguration  alone  offers  many  opportunities  for  performance  improvements  as  well  as 
difficult  design  decisions. 

A  major  reason  for  both  the  complexity  and  the  potential  of  a  transmitter  an¬ 
tenna  array  is  that  it  can  direct  data  to  multiple  receivers  simultaneously.  Such  a 
strategy  affects  many  system  components  and  is  reflected  in  our  transmitter  system 
architecture,  shown  in  Fig.  1-1.  We  use  a  packet-based,  streaming  data  model,  where 
some  streams  may  be  intended  for  individual  receivers  while  other  are  common  to 
more  than  one.  Many  types  of  data,  such  as  voice,  video,  and  hie  transfers,  can  be 
modeled  in  this  way.  We  partition  the  processing  of  this  data  into  two  system  tasks, 
denoted  scheduling  and  array  processing.  The  scheduler  divides  time  into  blocks  and 
decides  which  data  will  be  sent  over  each  block.  In  the  example  shown,  packets  from 
the  hrst  two  streams  are  sent  in  the  hrst  time  block,  etc.  Once  this  has  been  decided, 
the  transmitter  must  then  map  the  data  onto  the  physical  antenna  outputs  in  a  way 
that  will  allow  the  receivers  to  understand  the  messages  with  sufficient  hdelity.  This 
is  the  function  of  the  second  task,  which  we  call  array  processing,  and  can  encompass 
multiple-input,  multiple-output  processing;  modulation;  coding;  and  other  elements 
at  the  signaling  level.  Scheduling  and  array  processing  roughly  correspond  to  the 
standard  medium  access  control  (MAC)  and  physical  layers,  although  some  elements 
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of  both  layers  will  be  present  in  each  of  the  two  tasks. 

The  standard  approach  to  these  types  of  problems  has  been  through  layered  pro¬ 
tocols,  where  functions  at  different  levels  of  abstraction  are  considered  separately.  For 
example,  the  networking  community  often  concentrates  on  scheduling  while  assuming 
a  reliable,  interference-free  channel.  Array  processing  research,  on  the  other  hand, 
generally  does  not  consider  how  the  streams  are  selected  or  what  their  different  prop¬ 
erties  may  be.  For  array  systems  in  particular,  however,  performance  will  depend 
strongly  on  the  interaction  among  the  data,  scheduling,  array  processing,  and  phys¬ 
ical  channel.  This  suggests  both  a  more  comprehensive  design  process  and  greater 
integration,  or  at  least  awareness,  among  the  different  system  components.  Recently, 
there  has  been  some  interest  in  the  802.11  community  in  designing  scheduling  algo¬ 
rithms  that  are  more  aware  of  the  physical  channel  and  array  processing  (see  [51]  and 
references  therein),  though  the  emphasis  for  the  most  part  has  been  on  incremental 
upgrades  of  existing  systems.  In  this  thesis,  we  hope  to  develop  a  more  complete 
understanding  of  scheduling,  advanced  array  processing  techniques,  and  their  inter¬ 
actions  as  they  relate  to  different  system  goals. 

We  investigate  both  scheduling  and  array  processing  with  an  eye  toward  helping 
the  two  tasks  reinforce  one  another.  An  important  part  is  incorporating  knowledge, 
at  both  levels,  of  the  state  of  the  physical  channel  and  the  goals  and  destinations 
of  individual  data  streams.  Alternatively,  we  also  consider  tradeoffs  in  complexity 
between  the  two,  where  computation  can  be  placed  in  one  task  or  the  other  depending 
on  implementation  concerns.  In  many  cases,  a  good  portion  of  the  potential  gains  are 
available  when  only  one  side  incorporates  a  high  degree  of  sophistication.  For  example, 
we  adapt  signaling-level  precoding  techniques  to  satisfy  different  kinds  of  data  goals, 
and  develop  channel-aware  scheduling  techniques  that  enable  high  performance  under 
lower-complexity  choices  for  array  processing. 


1.1  Outline  of  Thesis 

Chapter  2  lays  the  groundwork  with  an  overview  of  several  concepts  related  to  trans¬ 
mitter  antenna  arrays.  We  discuss  how  elements  of  the  fading  channel  model  relate  to 
the  challenges  and  performance  goals  with  which  the  rest  of  the  thesis  is  concerned. 
Different  signaling  strategies  lead  to  two  basic  performance  criteria,  outage  proba¬ 
bility  and  ergodic  capacity,  which  are  important  to  keep  concrete  and  distinct.  We 
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also  provide  motivation  for  scheduling  several  streams  simultaneously  and  summarize 
some  of  the  well-known  array  processing  techniques  on  which  later  chapters  build. 

In  Chapter  3,  we  focus  on  the  array  processing  side  while  assuming  a  simple 
scheduler  that  divides  streams  into  sequential  or  random  groups.  We  primarily  build 
upon  the  spatial  precoding  techniques  described  by  Caire  and  Shamai  [7]  and  Ginis 
and  Cioffi  [30],  which  in  turn  were  adapted  from  precoding  for  intersymbol  interference 
and  information  embedding.  Recent  results  have  shown  that  this  family  of  techniques 
achieve  the  maximum  sum  capacity  across  all  receivers  (in  [7]  for  the  two-receiver 
channel,  and  [82,  71,  75]  for  any  number  of  receivers). 

We  introduce  precoding  with  a  matrix  formulation  that  emphasizes  the  connec¬ 
tion  to  other  strategies  and  makes  evident  various  options  and  extensions.  We  then 
develop  implementation  aspects,  such  as  robustness,  constellation  design,  and  meet¬ 
ing  different  types  of  performance  criteria.  For  example,  the  maximum  sum  capacity 
solution  can  cause  a  large  asymmetry  in  performance  among  receivers;  we  show  how 
a  modihed  order  of  operations  results  in  a  more  equitable  distribution.  We  conclude 
with  a  unihed  method  of  precoding  for  interference  across  both  time  and  different 
streams  and  compare  it  to  the  multitone  solution  advocated  in  [30]. 

Chapter  4  shifts  the  focus  to  channel-aware  scheduling  and  how  it  can  improve 
performance.  Such  schedulers  must  be  in  tune  with  goals  and  constraints  of  the  data 
streams;  we  develop  algorithms  for  three  data  classes  distinguished  by  their  delay 
tolerance  relative  to  certain  physical  parameters.  Although  further  development  is 
required  before  these  algorithms  can  provide  some  standard  quality  of  service  guar¬ 
antees,  they  do  show  some  dramatic  potential  improvements.  Especially  promising 
is  their  ability  to  select  subsets  of  streams  that  induce  very  low  interference.  In  one 
example,  under  beamforming  from  an  8-element  array,  the  medium-delay  algorithm 
exhibits  a  20  dB  gain  at  1%  outage  and  more  than  double  the  ergodic  capacity  com¬ 
pared  with  a  random  grouping  of  streams.  This  places  performance  in  the  range  of 
precoding,  with  much  lower  complexity  at  the  array  processing  level.  Because  precod¬ 
ing  systems  start  off  better,  scheduling  can  not  provide  as  dramatic  an  improvement, 
but  still  pushes  performance  toward  certain  idealized  limits  and  improves  robustness. 

In  Chapter  5,  we  take  a  closer  look  at  multicast  scenarios  where  streams  are 
intended  for  more  than  one  receiver.  In  these  cases,  the  scheduler  and  array  pro¬ 
cessing  can  work  together  to  transmit  to  all  recipients  simultaneously  and  avoid  the 
redundancy  of  duplication.  Unfortunately,  benehts  decrease  as  the  number  of  re- 
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cipients  grows  and  it  becomes  more  difficult  to  direct  the  stream  simultaneously  to 
all  of  them.  Using  a  single  such  stream  for  illustration,  we  describe  a  way  to  think 
about  the  balance  of  competing  objectives  in  terms  of  efficient  operating  points.  We 
then  discuss  methods  that  achieve  these  operating  points,  which  we  call  space-time 
multicast  codes,  as  well  as  more  practical  implementations.  When  the  number  of 
recipients  is  small  or  ergodic  capacity  is  most  important,  we  determine  that  beam¬ 
forming  strategies  are  a  good  choice.  In  the  more  general  case,  we  show  how  to  adapt 
ordinary  space-time  codes  to  this  multicast  scenario.  In  our  example,  these  gain  up 
to  6  dB  at  1%  outage  over  methods  that  do  not  use  channel  information  and  instead 
spread  transmission  out  to  all  possible  receivers.  Furthermore,  the  channel  informa¬ 
tion  allows  these  multicast  groups  to  £t  more  naturally  into  the  larger  picture  of  a 
system  with  heterogeneous  sets  of  data  and  receivers. 

We  provide  some  concluding  remarks  and  directions  for  future  research  in  Chap¬ 
ter  6. 
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Chapter  2 


Background  on  Transmitter 
Antenna  Arrays 


The  recent  interest  in  wireless  communication  has  resulted  in  a  large  number  of  system 
models,  algorithmic  structures,  and  channel  assumptions.  In  this  chapter,  we  describe 
elements  from  our  framework  and  introduce  notation  and  concepts  that  will  be  used 
in  later  discussion. 

We  build  up  our  channel  model  from  a  single  link  to  timesharing  to  spatial  mul¬ 
tiplexing  of  multiple  streams.  Although  we  will  mainly  deal  with  transmitter  arrays, 
the  single  link  system  is  enough  to  illustrate  different  signaling  approaches  toward 
fading  channels.  This  directly  relates  to  the  way  we  will  classify  data  and  judge  per¬ 
formance  throughout  the  rest  of  the  thesis.  We  then  introduce  arrays,  and  quickly 
review  some  major  issues  and  traditional  array  processing  techniques.  For  a  more 
comprehensive  description  of  wireless  communications  systems,  the  reader  is  referred 
to  Jakes’  book  [39]  or  the  more  recent  review  article  by  Biglieri,  et  ah  [6]. 


2.1  Notational  Conventions 

Scalars  are  given  by  lowercase  letters  (a),  vectors  by  boldface  lowercase  letters  (a),  and 
matrices  by  boldface  uppercase  letters  (A).  Certain  constants  or  parameters  are  given 
by  standard  uppercase  letters  (A).  When  appropriate,  explicit  time  dependences  are 
shown  using  square  brackets  a[n].  Complex  conjugation  is  denoted  a*,  and  is  the 
matrix  Hermetian  (conjugate  transpose).  Elements  of  vectors  or  matrices  are  denoted 
using  subscripts  (ai  or  Ai^s),  with  the  hrst  element  indexed  by  1.  If  a  is  a  random 


25 


variable,  then  £[a]  is  its  expectation. 


2.2  Communication  with  Single-Element  Anten¬ 
nas 

When  one  talks  about  “wireless  communication,”  what  is  usually  meant  are  electro¬ 
magnetic  information-bearing  signals,  transmitted  and  received  from  some  kind  of 
antennas,  and  propagating  without  waveguides.  Therefore,  they  are  subject  to  ther¬ 
mal  noise,  propagation  loss  that  increases  with  distance,  and  interference  from  other 
wireless  signals.  Also  important  are  the  self-interference  effects  of  reflections  that 
depend  greatly  on  the  particular  geometry  of  buildings,  walls,  and  other  objects  in 
and  around  the  path  between  the  transmitter  and  receiver. 

This  last  effect  requires  more  discussion  since  it  introduces  a  random  element 
called  fading  that  is  the  reason  for  much  of  the  research  in  wireless  communications. 
Reflections  are  received  as  multiple  copies  of  the  same  signal,  and  cause  different  ef¬ 
fects  depending  upon  the  difference  in  arrival  times.  If  the  receiver  samples  the  signal 
quickly  enough,  the  different  arrivals  will  become  resolvable  as  separate  delays.  In 
this  thesis,  however,  we  will  usually  assume  a  narrowband  model  with  symbol-spaced 
sampling  so  that  multipath  arrivals  are  not  resolvable.  The  arrivals  can  then  com¬ 
bine  constructively  or  destructively,  resulting  in  amplitude  variations.  The  maximum 
bandwidth  to  ensure  this  flat  fading  behavior  is  called  the  coherence  bandwidth.  Un¬ 
fortunately,  no  exact  formula  exists  to  compute  its  value,  although  one  rule  of  thumb 
is  l/Ti-ms,  where  Tmis  is  the  RMS  delay  spread  of  the  arrivals  [54].  Observed  values 
of  this  parameter  vary,  but  some  studies  place  it  in  the  tens  of  nanoseconds  for  in¬ 
door  environments,  and  on  the  order  of  a  few  millisecond  for  urban  environments. 
Even  when  the  fading  is  not  precisely  flat,  many  of  our  general  Endings  still  apply 
when  receivers  compensate  with  equalization  techniques  or  the  transmitters  use  more 
generalized  precompensation  such  as  discussed  in  Section  3.3. 

The  essential  elements  of  this  channel  model  can  be  expressed  in  the  equivalent 
complex  discrete-time  baseband  model  (where  all  time  dependencies  have  been  sup¬ 
pressed) 

y  =  h*x  -f  w,  (2.1) 

where  y  is  the  received  symbol,  x  is  the  transmitted  symbol,  h  is  the  channel  or  fading 
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coefficient,  and  w  is  additive  noise  that  encompasses  thermal  noise  and  any  unmodeled 
background  interference.  All  of  the  variables  in  (2.1)  are  complex- valued  scalars. 
Unless  specihed  otherwise,  the  additive  noise  w  will  be  a  zero-mean,  independent, 
identically-distributed,  circularly-symmetric  Gaussian  random  sequence  with  variance 
A/q-  Both  transmitter  and  receiver  are  assumed  to  know  the  noise  variance  A/q,  but  not 
the  particular  realization  w.  Throughout  this  thesis,  we  enforce  a  constraint  on  the 
expected  transmitted  power,  T[|a;p]  <  P,  and  investigate  how  various  scheduling  and 
array  processing  approaches  improve  received  performance.  This  constraint  is  meant 
to  incorporate  physical  limitations,  government  regulatory  issues,  and  the  practical 
issue  of  keeping  interference  to  a  local  set  of  receivers  such  as  one  cell  in  a  cellular 
environment.  (Wider  network-level  issues  involving  multiple  transmitters  are  beyond 
the  scope  of  this  thesis.)  Alternatively,  one  could  use  our  results  to  achieve  the 
received  performance  of  current  systems  at  reduced  power. 

The  fading  coefficient  h  itself  is  a  random  variable  that,  depending  on  the  channel 
environment,  can  be  modeled  with  various  distributions.  We  will  most  often  employ 
the  commonly-used  Rayleigh  model,  where  the  real  and  imaginary  components  of  h 
have  independent,  zero- mean  Gaussian  distributions.  Equivalently,  the  magnitude 
of  h  has  a  Rayleigh  distribution  (and  its  square  magnitude  has  a  distribution  that 
is  equivalently  exponential,  chi-square  with  two  degrees  of  freedom,  or  hrst-order 
Erlang),  while  the  phase  has  a  uniform  distribution.  This  is  valid  when  there  are  a 
large  number  of  scatterers  and  no  direct  line  of  sight  between  transmitter  and  receiver, 
and  accurately  models  many  indoor  or  urban  environments.  The  coherence  time  is 
the  duration  over  which  h  stays  approximately  constant.  One  popular  model  places 
the  coherence  time  at  about  [54] 

^  0.423A 

Tc  = - , 

V 

where  A  is  the  wavelength  of  the  signal  and  v  is  the  speed  of  the  receiver.  For  example, 
the  coherence  time  for  a  receiver  traveling  at  60  miles  per  hour  with  a  900  MHz 
signal  will  be  about  6.8  ms.  However,  even  with  both  transmitter  and  receiver  are 
stationary,  the  fading  will  typically  exhibit  some  time  variation.  Whether  the  fading 
stays  constant  or  varies  over  a  block  of  symbols  depends  on  the  physical  parameters 
and  signaling  format.  The  current  cellular  and  cordless  phone  standards  DAMPS, 
GSM,  and  DEGT  use  block  durations  on  the  order  of  hundreds  of  microseconds  to 
several  milliseconds,  but  sometimes  also  interleave  over  several  blocks. 
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2.2.1  Performance  Measures 


There  are  two  basic  approaches  toward  communicating  over  fading  channels.  If  the 
fading  coefficient  changes  relatively  slowly  with  time,  then  signaling  can  be  performed 
within  what  is  essentially  a  single  fade  of  random  quality.  At  the  other  extreme, 
one  can  signal  across  more  and  more  fades  and  achieve  an  overall  performance  that 
typically  becomes  deterministic.  Different  performance  criteria  are  appropriate  for 
these  two  scenarios;  we  shall  designate  these  criteria  as  outage  probability  and  ergodic 
capacity. 

In  either  case,  the  algorithms  and  performance  that  are  available  will  also  depend 
on  whether  one  or  both  sides  have  knowledge  of  the  fading  coefficients.  Receiver 
knowledge  is  a  fairly  common  assumption  and  is  possible  through  training,  a  sepa¬ 
rate  pilot  channel,  and/or  adaptive  algorithms  during  the  data  phase  itself.  Most 
current  wireless  standards  include  mechanisms  for  this  type  of  channel  estimation. 
Consequently,  we  will  assume  perfect  receiver  knowledge  unless  specified  otherwise. 

By  contrast,  transmitter  knowledge  (also  called  side  information)  is  typically  more 
difficult  to  obtain  and  in  some  situations  is  considered  to  be  less  crucial.  However, 
we  will  see  that  for  multiple-receiver  systems,  this  knowledge  is  very  important  to 
fulfilling  the  potential  of  the  array.  The  transmitter  can  attain  this  side  information 
in  two  ways.  First,  the  receiver  may  relay  its  information  through  a  separate  feed¬ 
back  channel.  Alternatively,  if  data  is  being  exchanged  in  both  directions  over  the 
same  frequency  band,  such  as  in  time  division  duplex  (TDD)  systems,  then  channel 
estimates  made  for  the  reverse  channel  will  be  valid  in  the  downstream  direction  as 
well. 

Characterizing  performance  by  outage  and  ergodic  capacity  is  not  new,  although 
most  authors  choose  one  form  or  the  other.  An  exception  is  the  diversity-multiplexing 
tradeoff  expressed  by  Zheng  and  Tse  [87].  Comparisons  to  our  scheme  may  be  useful 
to  keep  in  mind,  and  will  become  clearer  with  the  spatial  multiplexing  techniques  of 
Section  2.3.2.  However,  care  must  be  taken  in  understanding  the  different  contexts 
in  which  the  two  frameworks  come  up.  Zheng  and  Tse  deal  with  a  transmitter  that 
does  not  have  channel  knowledge.  As  discussed  above,  this  will  lead  to  a  different  set 
of  achievable  operating  points.  Furthermore,  we  will  see  that  this  leads  to  very  differ¬ 
ent  ideas  of  outage  and  error.  Secondly,  we  consider  low-complexity,  uncoordinated 
receivers,  so  that  the  performance  at  the  individual  receivers  becomes  as  important 
as  the  aggregate  total.  Capturing  this  new  tradeoff  will  be  addressed  throughout  the 


thesis. 


Outage  Probability 

If  the  channel  coefficient  h  is  known  at  both  sides  and  is  constant  for  the  time  span 
of  interest,  then  the  fading  channel  model  (2.1)  takes  the  form  of  an  additive  white 
Gaussian  noise  channel  with  received  signal  to  noise  ratio  (SNR) 


Since  both  coded  and  uncoded  techniques  for  this  channel  are  well-developed  and 
depend  only  on  this  measure,  we  can  capture  the  performance  over  random  fading 
with  an  outage  probability  curve,  which  we  define  here  as 

Proutage  =  Pr  {SNRrec  <  SNRq}  ,  (2.2) 

where  SNRq  is  a  parameter  that  can  take  on  any  nonnegative  value,  and  is  usually 
given  in  units  of  dB,  equal  to  10  log^g  SNRq.  This  curve  is  also  equal  to  the  cumulative 
distribution  function  (CDF)  of  received  SNR  over  the  fading  channel  ensemble. 

The  outage  curve  can  be  considered  as  a  measure  of  the  reliability  of  communica¬ 
tion.  For  any  target  SNRq,  the  outage  curve  will  show  the  probability  that  the  target 
will  be  met.  Perhaps  more  in  tune  with  the  goals  of  a  system  designer,  the  curve  can 
also  provide  the  appropriate  SNR  operating  point  if  a  target  outage  probability  is  to 
be  met.  Usually,  a  fairly  small  level  such  as  10%  or  1%  outage  or  lower  is  desired.  For 
this  reason,  it  may  be  equally  or  more  important  to  have  a  probability  distribution 
with  short  tails  than  one  with  a  large  mean.  In  the  next  section,  we  will  see  how 
the  use  of  an  array  can  concentrate  the  SNR  distribution  around  its  mean  and  thus 
create  a  more  desirable  channel. 

For  coded  systems,  a  key  quantity  is  the  mutual  information  of  the  channel,  which 
in  this  case  evaluates  to  the  rate 

in  bits  per  channel  use  when  given  an  optimal  (Gaussian)  input  distribution.  If  the 
channel  coefficient  stays  constant  for  long  enough,  this  mutual  information  represents 
a  maximum  reliable  rate  of  communication.  In  principle,  this  rate  can  then  be  ap- 
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proached  using  the  same  coding  and  shaping  techniques  that  have  been  so  successful 
in  the  additive  white  Gaussian  noise  channel,  including  trellis  coding,  turbo  coding 
and  shell  mapping  [22,  5,  43].  Therefore,  we  could  have  dehned  our  outage  in  terms 
of  a  cumulative  distribution  on  this  rate  instead  of  received  SNR.  We  choose  the  SNR 
version  because  it  is  also  valid  for  uncoded  systems  and  because  scaling  by  a  different 
transmitted  power  V  will  only  result  in  a  horizontal  shift  in  the  outage  curve  (when 
plotted  in  dB). 

The  instantaneous  rate  in  (2.3)  brings  up  an  important  difference  between  our 
model  and  one  where  the  transmitter  does  not  have  knowledge  of  the  channel  coeffi¬ 
cients.  Without  side  information,  the  transmitter  will  not  know  at  what  rate  it  can 
reliably  encode  data.  Ontage  probabilities  are  still  well-dehned,  and  it  was  in  this 
context  that  they  were  hrst  introdnced  by  Ozarow,  et  ah  [52].  Now,  however,  an 
ontage  event  means  a  failnre  withont  the  opportnnity  to  lower  the  rate  to  a  level  that 
is  known  to  be  achievable.  An  alternate  characterization,  used  by  Zheng  and  Tse  [87] 
as  well  as  many  other  anthors  (e.g.,  [61,  34])  comes  abont  from  letting  the  transmitter 
choose  a  hxed  modnlation  and  coding  scheme  and  then  compnting  probability  of  bit¬ 
wise  or  codeword  error  over  the  ensemble  of  possible  channel  realizations.  The  error 
rate  can  be  shown  graphically  for  different  transmitted  powers  V.  This  graph  will 
be  very  related  to  our  outage  curves  because  error  events  of  this  kind  are  generally 
dominated  by  low-qnality  channel  realizations.  However,  we  will  tend  to  avoid  this 
perspective  because  a  transmitter  that  has  channel  knowledge  will  be  able  to  adapt 
its  modnlation  and  coding  scheme  (or  choose  not  to  send  at  all)  depending  on  the 
realized  channel. 


Ergodic  Capacity 

If  the  channel  coefficient  h  varies  ergodically  over  time,  then  one  could  signal  across 
these  variations  and  hope  to  achieve  a  reliable  average  performance.  It  tnrns  ont  that 
this  idea  can  be  made  precise  for  a  variety  of  sitnations.  We  concentrate  on  coded 
performance  here,  althongh  systems  also  exist  that  result  in  deterministic  nncoded 
performance  [80]. 

Consider  a  coded  system  where  the  transmitter  has  knowledge  of  h  at  each  time 
instant.  The  system  achieves  the  rate  in  (2.3)  over  each  realization,  resulting  asymp- 
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totically  in  an  average  rate 


a 


ergodic 


=  £ 


logs 


(2.4) 


that  is  deterministic.  This  performance,  which  we  will  call  the  ergodic  capacity, 
depends  only  on  the  distribution  of  h  and  not  on  its  particular  evolution  in  time.  We 
will  see  below  that  this  rate  is  achievable  even  when  the  channel  varies  too  quickly  to 
send  codewords  within  each  individual  fade.  Some  authors  refer  to  ergodic  capacity 
as  the  (average)  throughput. 

Unlike  the  outage  probability  curve,  which  is  a  distribution,  the  ergodic  capacity 
results  in  a  single  number.  For  a  given  fading  distribution,  this  number  depends  only 
on  the  input  signal  to  noise  ratio. 


SNRi,p„t  =  T 


A/o 


From  the  concavity  of  the  function  log2(l  +  a;),  ergodic  capacity  must  be  smaller  than 
that  of  a  static  channel  with  the  same  input  SNR,  but  the  penalty  turns  out  not  to 
be  too  severe  for  most  fading  distributions.  For  example,  with  Rayleigh  fading  at  an 
input  SNR  of  0  dB,  the  ergodic  capacity  is  0.86  bits/channel  use,  as  opposed  to  1 
bit/channel  use  for  a  corresponding  static  channel. 

Perhaps  surprisingly,  the  same  rate  in  (2.4)  is  achievable  when  the  transmitter 
does  not  have  complete  channel  knowledge,  but  knows  only  the  statistics  of  h  and 
the  input  SNR.  This  follows  because  the  ergodic  capacity  can  also  be  achieved  using 
a  constant-rate  code,  as  long  as  the  codeword  symbols  are  interleaved  across  many 
channel  realizations.  Later  we  will  find  that  with  multiple-element  transmit  arrays, 
the  ergodic  capacity  will  become  higher  with  side  information  than  without. 

Instead  of  the  rate  in  (2.4),  some  authors  dehne  the  capacity  with  side  informa¬ 
tion  to  be  a  somewhat  higher  number  achieved  through  a  procedure  called  temporal 
waterhlling.  To  resolve  this  issue,  recall  that  in  our  power  constraint,  a  limit  is  placed 
on  the  expected  power  of  each  symbol  x[n].  One  might  call  this  a  peak  power  con¬ 
straint  (in  the  stochastic  sense;  a  particular  realized  value  of  x[n]  may  have  power 
than  is  higher  than  V).  A  somewhat  looser,  average  power  constraint  would  allow  the 
transmitter  to  send  some  symbols  with  higher  power  than  others,  as  long  as  the  time 
average  remains  below  V.  A  transmitter  with  channel  knowledge  will  then  use  more 
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power  on  stronger  channel  realizations,  “pouring”  power  over  the  inverse  of  SNR  [14], 


V[n]  =  V  ■ 


A- 


A/q 


where  the  Lagrange  multiplier  parameter  A  is  chosen  to  satisfy  the  average  power 
constraint  and  [a]’*'  =  max(a,  0).  Waterhlling  can  be  used  to  solve  a  variety  of  parallel 
channel  problems,  and  will  show  up  again  later  in  this  role. 


2.3  Transmitter  Antenna  Arrays 

Our  main  results  consider  a  transmitter  antenna  array  and  multiple  receivers,  bringing 
an  increased  complexity  to  both  the  channel  model  and  the  different  approaches  a 
system  may  use. 

See  Fig.  2-1  for  a  diagram  of  the  channel  model  with  a  three-element  array  and 
three  receivers.  In  general,  the  transmitter  now  has  M  antenna  elements  from  which 
it  can  send  a  vector  of  symbols,  x.  These  signals  arrive  at  the  K  receivers  through 
a  cross-coupled  channel,  where  the  link  between  each  antenna  element  and  receiver 
is  an  independent  Rayleigh  channel  of  the  type  described  in  the  previous  section.  If 
we  collect  all  of  the  fading  coefficients  Hk^m  into  a  matrix  H,  then  this  cross-coupled 
channel  can  be  succinctly  modeled  as  a  matrix  multiplication, 

y  =  Hx  +  w.  (2.5) 

The  power  constraint  now  becomes  £[x'^x\  <  V,  so  that  the  maximum  transmitted 
power  is  the  same  as  with  a  single  antenna  element. 

The  inclusion  of  the  cross-coupled  channel  has  both  positive  and  negative  effects. 
First  of  all,  the  array  provides  multiple  paths  to  each  receiver,  so  that  if  one  link 
undergoes  a  fade  of  poor  quality,  other  links  are  likely  to  be  better.  In  this  way,  a 
more  reliable  overall  channel  can  be  sustained.  This  is  an  example  of  diversity,  which 
refers  to  taking  advantage  of  multiple  paths  to  a  receiver.  For  this  to  work,  however, 
it  is  important  that  the  different  copies  be  independently  faded,  or  at  least  nearly  so. 
Whether  this  is  true  depends  on  the  physical  separation  of  the  antenna  elements  in 
array,  the  wavelength  A,  and  the  location  of  scatterers.  For  indoor  Rayleigh  environ¬ 
ments,  for  instance,  the  necessary  separation  between  elements  can  be  as  small  as  A/2. 
This  diversity-centered  model  is  not  to  be  confused  with  phased  array  transmitters. 
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Figure  2-1;  Channel  model  for  transmitter  antenna  array  and  multiple  receivers. 


which  operate  in  a  regime  where  the  coefficients  have  near-perfect  correlation. 

The  potentially  harmful  effect  is  interference.  A  transmitted  data  stream  will  go 
to  all  receivers,  whether  intended  or  not.  To  deal  with  this,  various  scheduling  and 
array  processing  techniques  can  be  used.  We  begin  with  the  simplest,  which  is  to 
transmit  to  only  one  receiver  at  a  time  and  therefore  ignore  any  interference  that 
is  caused.  Afterward,  we  will  consider  transmitting  mnltiple  streams  simnltaneously 
nsing  array  processing  to  mitigate  interference,  a  process  called  spatial  mnltiplexing. 


2.3.1  Array  Processing  Techniques  Under  Timesharing 

If  the  scheduler  only  selects  one  stream  and  one  intended  receiver  at  a  time,  interfer¬ 
ence  becomes  irrelevant.  The  array  processor  can  then  select  a  transmission  scheme 
based  npon  ontage  or  ergodic  capacity  performance  criteria  at  the  intended  receiver, 
as  well  as  other  considerations  snch  as  complexity. 

The  array  processor  must  specify  the  transformation  from  the  data  stream,  s[n], 
to  the  vector  of  antenna  ontpnts,  x[n],  over  the  time  block  of  interest.  In  general,  this 
may  include  block  processing  and  any  kind  of  vector  coded  or  uncoded  modulation 
that  satisfies  the  power  constraint.  It  turns  ont,  however,  that  optimal  performance  in 
this  single- receiver  scenario  can  be  achieved  by  separating  the  modulation/encoding 
from  the  multiple  antenna  element  considerations  using  a  techniqne  called  beamform¬ 
ing. 
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Beamforming 


Assume  that  the  data  stream  s[n]  has  been  modulated  and,  if  desired,  encoded  as 
if  for  a  scalar  additive  Gaussian  white  noise  channel.  The  array  processor  can  then 
perform  a  linear  transformation  on  each  data  symbol,  x  =  gs  for  some  set  of  weights 
g,  such  that  the  signals  from  the  different  antenna  elements  combine  coherently  at  the 
intended  receiver.  This  coherent  combining  results  in  the  maximum  possible  received 
SNR  over  each  realization,  and  is  therefore  optimal. 

We  can  study  the  performance  of  this  solution  in  more  detail.  If  the  receiver’s 
vector  of  channel  coefficients  is  h,  it  effectively  experiences  an  additive  Gaussian  white 
noise  channel  from  s[n]  with  a  received  SNR  of 


SNRrec 


V\h)9? 

A/q 


This  is  maximized  by  matching  the  beamforming  direction  to  the  channel  vector, 
g  =  h,/||h.||,  leading  to  the  optimal  value  of 


SNRrec 


VWhf 

A/q 


(2.6) 


The  probability  distribution  of  (2.6)  under  Rayleigh  fading  is  an  Mth-order  Erlang 
(or,  equivalently,  chi-square  with  2M  degrees  of  freedom,  denoted  xIm)-  This  has 
M  times  the  mean  of  transmission  from  a  single  antenna  element,  with  considerably 
smaller  tails.  The  implications  of  this  will  become  apparent  shortly. 

We  plot  ergodic  capacity  and  outage  probability  for  several  scenarios  in  Fig.  2-2 
and  Fig.  2-3,  respectively.  For  normalization,  we  define  an  “input  SNR  per  link”  as 


Input  SNR  per  link  = 

A/o 

This  value  will  usually  be  set  at  5  dB  in  our  examples,  as  this  leads  to  reasonable 
coded  rates  in  multiuser  scenarios  and  is  within  the  usual  operating  range  given  in 
the  literature. 

The  ergodic  capacity  improves  with  the  number  of  antenna  elements,  mainly  be¬ 
cause  of  the  increase  in  mean  received  SNR.  Once  again,  we  see  that  the  random 
channel  variations  often  do  not  decrease  ergodic  capacity  significantly.  On  the  other 
hand,  the  shape  of  the  fading  distribution  is  very  important  when  signaling  over  sin- 
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Figure  2-2:  Ergodic  capacities  for  an  M-element  transmit  array  and  input  SNR  per 
link  of  5  dB.  We  show  the  curve  for  Rayleigh  fading,  and  for  comparison  the  additive 
white  Gaussian  noise  channel  that  does  not  encounter  fading. 

gle  fading  realizations.  In  Fig.  2-3,  we  see  how  this  effect  can  dramatically  affect 
the  outage  characteristic.  At  1%  outage,  adding  a  second  antenna  element  results 
in  a  gain  of  over  10  dB,  even  though  the  mean  only  doubles  (3  dB).  Note  also  the 
diminishing  returns  that  are  typical  of  diversity  techniques;  most  gains  occur  as  the 
hrst  few  antenna  elements  are  added. 


Space-Time  Coding 

The  performance  curves  above  require  the  transmitter  to  have  knowledge  of  the  chan¬ 
nel  parameters.  Even  if  there  is  a  small  amount  of  uncertainty  in  the  channel  mea¬ 
surement,  it  turns  out  that  beamforming  is  still  optimal  from  the  point  of  view  of 
maximizing  channel  capacity  [49,  72]  or  expected  received  SNR.  However,  when  the 
transmitter  does  not  have  access  to  channel  information,  beamforming  in  any  single 
direction  results  in  the  same  distribution  as  with  single-element  transmission.  We  will 
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Figure  2-3:  Single-receiver  outage  probabilities  for  a  transmit  antenna  array  with 
M  =  1,  2,  3,  and  4  elements,  using  a  Rayleigh  fading  model  at  an  input  SNR  per  link 
of  5  dB. 

also  see  in  Chapter  5  that  channel  information  becomes  less  useful  when  a  stream  is 
intended  for  multiple  receivers,  because  the  transmitter  can  not  direct  data  to  all  of 
them  simultaneously.  In  these  cases,  more  complex  implementations  may  be  useful 
and  are  often  given  the  general  heading  of  space-time  codes. 

The  transformation  between  the  data  stream  s[n]  and  the  antenna  outputs  x[n] 
can  take  a  number  of  forms.  One  common  element  to  space-time  codes  is  that  the 
covariance  matrix  £[xx'^]  has  rank  above  one;  the  vector  of  antenna  element  outputs 
at  a  particular  time  contains  information  from  more  than  one  input  symbol.  In  fact, 
the  ergodic  capacity  is  maximized  by  letting  this  covariance  be  a  scaled  identity  [63]. 
Practical  implementations  include  transformations  resembling  either  convolutional 
[61,  34]  or  block  encoders  [1,  62].  In  some  special  cases,  as  well  as  under  idealized 
assumptions,  these  techniques  are  able  to  achieve  performance  equivalent  to  a  received 
SNR  distribution  that  is  Mth  order  Erlang,  but  they  sacrihce  a  factor  of  M  in  mean 
SNR  compared  with  beamforming  under  perfect  channel  knowledge. 
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Table  2.1:  Maximum  achievable  coded  rates  for  sending  distinct  streams  to  two  re¬ 
ceivers  using  different  multiplexing  methods.  All  methods  use  the  same  symbol  du¬ 
ration  and  bandwidth,  and  spatial  multiplexing  assumes  a  best-case  scenario  with 
orthogonal  channel  vectors  hi  and  /i2-  The  parameter  a  represents  the  fraction  of 
time  (for  timesharing)  or  power  (for  CDMA  or  spatial  multiplexing)  devoted  to  the 
hrst  receiver. 

2.3.2  Spatial  Multiplexing  of  Multiple  Streams 

The  scheduler  also  has  the  option  of  sending  multiple  streams  simultaneously.  Inter¬ 
ference  then  becomes  an  issue,  but  if  it  can  be  dealt  with  effectively,  spatial  multi¬ 
plexing  has  several  potential  advantages.  Among  these  are: 

•  Increased  Performance:  We  illustrate  the  potential  improvement  using  a  coded 
system  example  where  distinct  streams  are  directed  to  their  intended  receivers 
using  the  type  of  single-user  beamforming  described  above.  In  the  best-case 
scenario  where  the  rows  of  the  channel  matrix  H  are  orthogonal,  the  trans¬ 
mitter  can  send  the  streams  simultaneously  without  incurring  any  interference. 
With  this  assumption.  Table  2.1  compares  the  maximum  achievable  rates  to  two 
receivers  for  timesharing  and  spatial  multiplexing,  as  well  as  a  third  technique, 
code  division  multiple  access  (CDMA),  whereby  the  streams  are  modulated  over 
linearly  independent  waveforms.  (Actual  CDMA  systems  usually  operate  in  a 
wideband  regime  under  different  channel  modeling  assumptions,  however.)  We 
also  plot  these  rate  regions  for  a  sample  channel  realization  in  Fig.  2-4. 

It  can  be  shown  (using  Jensen’s  inequality)  that  spatial  multiplexing  over  or¬ 
thogonal  channels  always  results  in  the  largest  rate  region,  and  that  the  dis¬ 
parity  increases  as  the  number  of  antenna  elements  and  receivers  grows  larger. 
Looking  at  the  formulas  in  the  table,  this  improvement  is  reminiscent  of  that 
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Figure  2-4:  Achievable  rate  regions  for  a  sample  channel  realization  with  orthogonal 
channel  vectors.  ||h.i|p  =  1,  ||h,i|p  =  1.5,  and  V /No  =  1. 

achieved  by  increasing  the  bandwidth  of  a  continuous-time  channel,  where  the 
capacity  with  bandwidth  W  is  >Vlog2(l  -|-  V / {NW)).  In  fact,  something  very 
similar  to  this  is  occurring:  the  spatial  multiplexing  system  is  able  to  devote  its 
full  time-bandwidth  resources  to  each  receiver  simultaneously,  while  timeshar¬ 
ing  and  CDMA  divide  these  resources  up  among  the  receivers.  In  the  extreme 
case  where  the  number  of  antenna  elements  and  receivers  (set  M  =  K)  grows 
large  and  a  =  1/M,  the  sum  rate  across  receivers  for  spatial  multiplexing  be¬ 
comes  Mlog2(l  +  V/No)-  This  dramatic,  asymptotically  linear  increase  with 
the  number  of  antenna  elements  recalls  similar  results  when  the  receivers  are 
able  to  fully  coordinate  [63,  27]. 

Of  course,  realistic  channel  matrices  will  not  often  have  orthogonal  rows,  but 
the  above  arguments  provide  motivation  for  investigating  spatial  multiplexing 
further.  We  will  apply  array  processing  (Chapter  3)  and  then  scheduling  (Chap¬ 
ter  4)  to  try  to  approach  this  performance. 
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•  Upgrade  of  Existing  Systems:  Spatial  multiplexing  provides  a  method  for  in¬ 
creasing  the  number  of  receivers  that  a  system  can  handle.  In  some  cases,  this 
can  be  implemented  into  current  standards  with  relatively  few  alterations,  and 
ideally  requires  only  adding  a  few  antenna  elements  and  some  additional  pro¬ 
cessing  to  an  existing  array.  Alternative  ways  to  increase  system  capacity,  such 
as  purchasing  additional  spectrum  or  base  stations,  may  be  very  expensive  or 
difficult  to  bring  about. 

•  Flexibility:  A  spatial  multiplexing  system  can  incorporate  a  great  number  of 
algorithmic  and  implementation  options.  For  example,  in  many  cases,  most  of 
the  benehts  of  the  array  are  available  by  adding  sophistication  to  either  the 
scheduling  or  array  processing  task.  We  will  also  see  how  to  select  and  tune 
algorithms  to  meet  the  goals  of  different  types  of  data  streams.  Design  choices 
can  be  made  based  upon  implementation  issues  and  the  different  situations  that 
are  likely  to  come  up,  including  the  number  and  mobility  of  receivers. 

To  effectively  use  spatial  multiplexing,  the  transmitter  must  deal  with  the  issue 
of  interference.  One  possible  element  of  an  interference-avoidance  strategy,  to  be 
discussed  in  Chapter  4,  is  to  design  channel-aware  schedulers  that  select  groups  of 
receivers  with  nearly  orthogonal  channel  vectors.  Even  with  this  type  of  scheduler, 
the  array  processing  block  will  likely  need  to  compensate  for  some  interference.  In 
this  thesis,  we  will  concentrate  on  so-called  “zero-forcing”  schemes  that  remove  all 
interference,  leaving  the  receivers  with  only  their  intended  signals  and  the  additive 
white  noise  Wk-  For  the  systems  we  consider,  and  the  regimes  in  which  they  operate, 
this  will  lead  to  analyzable,  relatively  low-complexity  solutions  that  perform  nearly 
as  well  as  optimal  schemes.  In  Chapter  3,  we  present  a  detailed  development  of 
precoding  techniques  that  are  of  this  vein.  For  the  moment,  however,  we  briefly 
describe  a  well-known  linear  method  for  array  processing. 

Multiple-Receiver  Beamforming 

We  look  to  extend  beamforming,  which  was  sufficient  for  optimality  under  timeshar¬ 
ing,  to  deal  with  multiple  receivers.  Once  again,  assume  that  each  stream  has  been 
modulated  and,  if  desired,  encoded  as  if  for  an  additive  white  Gaussian  noise  channel. 
The  vector  of  antenna  element  outputs  can  now  be  selected  as  a  linear  combination 
of  the  current  symbols  from  all  of  the  streams,  x  =  Gs,  where  G  is  a  called  the 
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beamforming  matrix.  The  channel  model  (2.5)  now  specializes  to 


y  =  HGs  +  w.  (2.7) 

If  the  symbols  Sk  are  independent  and  zero-mean  with  variance  V,  then  the  appro¬ 
priate  power  constraint  on  G  is  trace  {G^G]  <  1. 


Assume  for  now  that  each  of  the  elements  in  s  is  intended  for  a  separate  receiver. 
If  any  streams  were  common  to  multiple  receivers,  the  scheduler  can  simply  duplicate 
them.  We  will  return  to  more  efficient  methods  of  multiplexing  common  information 
in  Chapter  5. 


In  selecting  the  beamforming  matrix  G,  there  is  an  inherent  tradeoff  between 
increasing  signal  power  and  reducing  interference.  The  zero-forcing  approach  is  to 
eliminate  interference  by  hnding  a  G  for  which  HG  is  diagonal.  For  independent 
Rayleigh  fading,  this  can  be  done  with  probability  one  as  long  as  the  number  of 
antenna  elements  in  the  transmitter  array  is  at  least  as  large  as  the  number  of  re¬ 
ceivers.  The  pseudoinverse  produces  the  best  such  matrix  in  terms  of  maximizing  the 
individual  SNRs,  and  was  used  by  Gerlach  and  Paulraj  [29].  Unfortunately,  by  con¬ 
centrating  so  much  on  interference,  this  solution  can  result  in  reduced  signal  power 
at  the  receivers.  For  randomly-chosen  data  streams,  we  essentially  lose  the  effect  of 
one  of  the  transmitter  antenna  elements  for  every  receiver  that  had  to  be  nulled  out. 


Other  useful  beamforming  strategies  exist.  One  can  optimize  received  signal  power 
by  setting  G  proportional  to  H\  often  at  the  expense  of  high  interference.  A  balance 
between  this  “matched  hlter”  solution  and  zero  forcing  would  be  to  maximize  the 
signal-to-interference-plus-noise  ratio  (SINR).  This  is  particularly  useful  when  the 
interference  is  close  to  Gaussian  distributed.  Rashid-Farrokhi,  et  ah  [55]  found  a 
solution  (later  rehned  by  Visotsky  and  Madhow  [73])  for  reaching  specihed  SINR 
levels  at  each  receiver  with  the  minimum  total  transmit  power.  Unfortunately,  the 
form  was  of  an  iterative  algorithm,  and  would  require  even  more  iterations  to  map  it 
to  a  power  constraint  rather  than  SINR  constraints.  For  the  less  ambitious  problem 
of  power  control  to  equalize  SINRs  given  a  set  of  beamforming  directions,  an  analytic 
solution  was  found  by  Yang  and  Xu  [81]. 
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2.3.3  Coordinated  Versus  Uncoordinated  Receivers 

Our  basic  model  of  a  base  station  and  several  low-complexity,  geographically  sepa¬ 
rated  receivers  naturally  makes  it  difficult  for  these  receivers  to  achieve  a  large  amount 
of  coordination.  Therefore,  we  have  assumed  that  they  have  no  knowledge  of  each 
other’s  received  signals.  Before  going  on,  however,  it  may  be  useful  to  say  a  few  words 
about  what  is  possible  when  they  do  coordinate. 

A  variation  on  the  model  (2.5)  would  be  for  a  single  receiver  to  have  access  to  all  K 
antenna  outputs.  The  usual  application  would  be  if  all  of  the  receive  antenna  elements 
were  located  within  a  single  array.  The  purpose  then  is  to  simply  communicate  as 
much  total  information  as  possible,  rather  than  dividing  the  information  into  separate 
streams  for  the  different  receivers.  We  briefly  summarize  some  information  theoretic 
results  for  coded  systems. 

When  both  transmitter  and  receiver  know  the  channel  matrix  H,  the  transmitter 
should  send  on  the  principle  directions  of  H  and  waterhll  over  the  singular  values 
[63].  Note  that  this  requires  both  transmitter  and  receiver  to  use  beamforming. 

When  only  the  receiver  has  channel  information,  capacity  can  be  achieved  when 
the  elements  of  x  are  i.i.d.  over  both  space  and  time  [63,  27].  The  capacity  is 
then  asymptotically  proportional  to  min(M,  K)  at  high  SNR.  If  M  =  77,  then  this 
represents  an  asymptotically  linear  growth  in  capacity  with  the  number  of  antenna 
elements  at  each  end,  a  result  that  has  generated  much  excitement  in  the  held.  Sim- 
plihed  receivers  that  strip  off  and  decode  one  layer  of  x*  at  a  time  do  not  seem  to  lose 
much  over  the  theoretical  capacity  [25,  3].  Recent  results,  though,  have  shown  that 
the  linear  growth  in  min(M,  K)  at  high  SNR  relies  heavily  on  having  perfect  channel 
knowledge  at  the  receiver  and  may  not  hold  up  to  more  realistic  assumptions  [41]. 

If  neither  the  receiver  nor  the  transmitter  knows  the  channel,  then  i.i.d.  symbols 
over  time  will  not  suffice.  All  information  must  now  be  contained  in  the  correlations 
between  symbols.  This  type  of  signaling,  then,  relies  on  the  channel  not  changing 
too  quickly,  so  researchers  often  choose  a  block  constant  fading  model.  This  channel 
has  been  studied  by  Marzetta  and  Hochwald  in  [45]  and  subsequent  papers  that 
investigated  specihc  coding  schemes.  A  geometrical  perspective  is  given  by  Zheng 
and  Tse  [86],  including  a  study  of  the  relationship  between  the  length  of  the  block 
fade  and  the  number  of  antenna  elements  that  can  be  used  effectively. 

We  will  hnd  that,  with  the  proper  scheduling  and  array  processing,  systems  with¬ 
out  receiver  coordination  will  often  be  able  to  achieve  most  of  the  ergodic  sum  capacity 
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that  is  possible  with  coordination.  It  is  important  to  remember,  however,  that  this  is 
not  our  only  goal.  Systems  of  the  type  we  examine  must  also  consider,  for  instance, 
balancing  the  requirements  of  the  individual  data  streams,  directing  them  to  single 
or  multiple  receivers,  and  doing  this  all  with  reasonable  complexity  and  robustness. 
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Chapter  3 


Preceding  with  Simple  Scheduling 


Our  first  in-depth  investigation  conies  at  the  array  processing  level,  as  the  transmitter 
attempts  to  direct  multiple  streams  to  their  respective  receivers  simultaneously.  Very 
recently,  precoding-based  approaches  to  this  problem  have  appeared  in  the  literature 
that  show  great  promise  [7,  30].  Yet  much  work  remains  in  understanding  their 
properties,  performance,  and  implementations.  In  this  chapter,  we  place  precoding  in 
perspective  within  a  general  matrix-based  model,  and  investigate  some  of  the  design 
choices  involved  with  different  types  of  data,  modulation,  and  channel  models.  In 
the  process,  we  add  several  extensions  and  implementation  algorithms  to  the  basic 
precoding  structure. 

The  main  precoding  algorithm,  as  applied  to  cross-coupled  matrix  channels,  can 
be  understood  as  a  refinement  of  the  linear  zero-forcing  approach  described  previously. 
Instead  of  diagonalizing  the  channel  matrix  (thus  eliminating  interference)  in  one  step, 
precoding  adds  an  intermediate  triangularization.  The  residual  interference  is  then 
dealt  with  using  a  more  complicated  operation  that  combines  linear  and  nonlinear 
elements,  and  often  results  in  much  higher  overall  performance.  For  example,  the 
ergodic  sum  capacity  across  receivers  for  precoding  can  be  several  times  that  of  zero¬ 
forcing  beamforming  or  timesharing.  Even  more,  this  general  family  of  precoding 
algorithms  has  been  shown  to  achieve  the  maximum  sum  rate  of  any  method  for  this 
channel  [82,  71,  75].  In  Section  3.1,  we  describe  this  view  of  precoding  and  then 
characterize  its  performance  and  connection  with  other  partitioned  approaches  such 
as  BLAST  [25]. 

Section  3.2  is  concerned  with  issues  that  come  up  when  applying  precoding  to  sys¬ 
tems.  These  include  organizing  the  processing  to  meet  different  performance  criteria. 
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finding  low-complexity  modulation  techniques  to  eliminate  interference,  and  a  pre¬ 
liminary  consideration  of  robustness  to  imperfect  channel  information.  By  exploring 
such  issues,  we  hope  to  begin  bridging  the  gap  between  describing  what  is  possible 
and  addressing  design  choices  for  particular  systems. 

In  Section  3.3,  we  generalize  precoding  to  compensate  for  interference  across  both 
different  streams  and  time.  By  considering  a  matrix  transfer  function,  we  determine 
the  types  of  processing  that  should  be  done.  We  hnd  that  there  is  more  than  one 
possibility,  depending  on  the  ordering  of  the  interference  cancellation  that  is  to  be 
done.  We  also  compare  our  algorithms  with  the  discrete  multitone-based  method  of 
Ginis  and  Cioffi  [30] ,  which  converts  the  matrix  intersymbol  interference  channel  into 
a  number  of  parallel  flat  channels  with  only  multiuser  interference. 

As  a  hnal  note,  the  discussions  of  this  chapter  should  be  taken  in  two  ways.  First 
is  the  spatial  precoder’s  value  in  dealing  with  the  narrowly-focused  array  processing 
problem  at  hand.  Secondly  is  its  use  as  one  of  many  building  blocks  within  a  larger 
system,  where  a  large  number  of  streams  are  communicated  with  different  require¬ 
ments  over  time-varying  channels.  We  will  deal  more  with  this  second,  higher-level 
view  as  we  consider  the  impact  of  scheduling  later  in  the  thesis. 


3.1  Precoding  for  Multiuser  Communications 

In  this  hrst  section,  we  bring  together  results  on  precoding  using  a  framework  that 
emphasizes  partitioning  and  matrix-based  operations.  Our  development  proceeds 
through  the  elements  of  such  a  system,  from  linear  processing  to  multidimensional 
coding  techniques.  In  a  natural  way,  it  highlights  the  importance  of  ordered  interfer¬ 
ence,  the  range  of  precoding  options  that  are  available  for  a  general  multiple-receiver 
model,  and  how  these  relate  to  other  types  of  array  processing.  We  also  set  up  re¬ 
sults  in  later  sections  on  implementation  and  combined  multiuser  and  intersymbol 
interference. 

Throughout,  we  assume  a  simple  scheduling  algorithm  that  selects  random  or 
sequential  groups  of  streams  for  spatial  multiplexing,  and  furthermore  duplicates  any 
streams  that  are  intended  for  multiple  recipients.  We  will  consider  more  sophisticated 
schedulers  in  Chapter  4  and  more  efficient  multicast  approaches  in  Chapter  5. 
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3.1.1  Precoding  for  Triangular  Channels 

We  first  describe  precoding  for  situations  where  the  channel  matrix  is  triangular. 
This  allows  us  to  apply  existing  results  on  layered  interference,  and  will  prove  to  be  a 
vital  step  in  dealing  with  arbitrary  channel  matrices.  With  this  later  use  in  mind,  we 
formulate  precoding  in  somewhat  unorthodox  terms  as  a  matrix  inverse  intertwined 
with  additional,  nonlinear  operations. 

Precoding  relies  on  an  implied  ordering  in  the  way  symbols  or  data  streams  inter¬ 
fere.  Recall  that  in  our  channel  model, 

y  =  Hx  +  w,  (3.1) 

the  channel  matrix  of  fading  coefficients,  H,  represents  the  transformation  from  an¬ 
tenna  array  outputs  to  receivers  outputs,  before  white  Gaussian  noise  is  added.  If  this 
matrix  is  lower  triangular  and  x  is  simply  the  vector  of  data  stream  symbols,  then 
receivers  only  get  nonzero  power  from  their  own  stream  and  those  indexed  earlier 
within  the  vector  x.  If  the  transmitter  processes  the  streams  in  this  indexed  order,  it 
will  know  a  priori  what  interference  is  to  be  expected,  and  can  precompensate  for  this 
known  interference.  This  type  of  approach  hrst  appeared  as  Tomlinson-Harashima 
(TH)  precoding  [64,  36,  47]  over  the  intersymbol  interference  (ISI)  channel,  where  a 
single  stream  exhibits  self-interference  across  time.  More  recently,  researchers  have 
used  ideas  from  Costa’s  “writing  on  dirty  paper”  [13]  to  rehne  precoding  and  apply 
it  to  many  other  problems,  such  as  information  embedding  and  digital  watermarking 
(see  [84]  and  references  therein).  In  most  cases,  this  dirty-paper  encoding  and  its 
various  implementations  [10,  20]  can  achieve  the  same  coded  rates  as  without  any 
interference;  i.e.,  had  the  off-diagonal  elements  of  H  been  set  to  zero.  Caire  and 
Shamai  [7]  and  Ginis  and  Cioffi  [30]  then  applied  these  ideas  to  the  matrix  channel 
with  arbitrary  H  matrix  by  introducing  the  additional  triangularization  step. 

The  intersymbol  interference  channel  can  be  interpreted  as  a  triangular  matrix 
channel  with  special  structure,  and  serves  as  a  useful  starting  point  for  our  discussion. 
Consider  a  discrete-time,  linear  time-invariant  channel, 

y[n]  =  h[n]  *  x[n]  +  w[n],  (3.2) 

with  a  causal,  monic,  minimum-phase  impulse  response  h[n].  If  we  convert  the  input, 
output,  and  noise  sequences  to  vectors,  then  (3.2)  can  be  written  as  a  lower-triangular 


45 


Figure  3-1:  Transmitter  for  TH  precoding  system. 


matrix  channel  (3.1).  For  example,  the  convolution  matrix  H  for  three  data  symbols 
and  an  impulse  response  of  length  2  will  have  the  form 


H  = 


1  0  0 

h[l]  1  0 

0  h[l]  1 


where  h[0]  =  1  because  the  channel  response  was  assumed  to  be  monic. 

Suppose  that  the  transmitter  uses  uncoded  A^-QAM  modulation,  where  A  is  an 
even  integer,  and  wishes  to  eliminate  interference.  (Extensions  to  odd  A  are  straight¬ 
forward.)  The  real  and  imaginary  parts  of  each  input  symbol,  s[n],  will  therefore  take 
on  values  from  among 


{-(A  -  1)C,  -(A  -  3)C, . . . ,  (A  -  3)C,  (A  -  1)0, 


where  C  is  a  real  constant  chosen  so  that  the  transmitted  symbols  obey  the  power 
constraint.  The  TH  precoding  system  of  Fig.  3-1  has  a  feedback  loop  to  determine 
what  the  interference  would  have  been  for  each  symbol,  then  subtracts  this  amount 
off  to  produce  a  net  effect  of  zero  interference.  This  subtraction  can  result  in  symbols 
with  large  energy,  so  a  modulo  operation  is  performed  to  correct  for  this.  The  receiver 
will  also  have  to  compensate  for  this  correction,  as  we  describe  below. 

To  understand  this  system  further,  and  to  connect  it  to  our  matrix  model,  con¬ 
sider  the  function  of  the  modulo  operation.  This  box  shifts  the  real  and  imaginary 
components  of  its  input  until  both  are  in  the  range  (— In  other  words,  it 
adds  2(A-m[n\  to  the  input,  where  m[n\  is  the  unique  complex  integer  such  that  the 
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Figure  3-2:  TH  precoding  system  using  matrix  model.  The  last  box  is  a  “sheer,” 
which  implements  nearest-neighbor  detection  on  a  modulo-extended  constellation. 

output  is  in  the  square  region  A  =  {(— x  (— which  we  denote  as  the 
fundamental  region  of  the  complex  plane  with  respect  to  this  modulo.  If  m[n]  were 
known  in  advance,  this  addition  could  have  been  performed  before  the  feedback  path 
is  subtracted,  resulting  in  a  modulo-equivalent  version  of  the  input, 

s[n]  =  s[n]  -|-  2A(  ■  m[n], 

a  process  known  as  constellation  expansion.  If  we  consider  the  entire  vector  of  modulo- 
equivalent  input  symbols,  then  the  remainder  of  the  feedback  loop  is  equivalent  to  a 
matrix  inverse  and  the  precoder  takes  the  form  shown  in  Fig.  3-2.  Note  that  since  we 
have  assumed  that  the  diagonal  elements  of  H,  and  therefore  of  are  unity,  the 
outputs  of  the  precoder  are  in  the  same  fundamental  region  as  its  inputs.  Therefore, 
to  hrst  order,  the  precoder  conserves  the  energy  of  the  input  symbols.  We  will  see 
in  Section  3.1.3  that  under  closer  inspection,  there  is  a  “precoding  power  loss”  that 
becomes  noticeable  for  low-order  modulation  [23],  but  can  be  compensated  for  by 
allowing  a  small  amount  of  interference  through. 

The  received  vector  is  a  noisy  version  of  the  modulo-equivalent  input,  s,  rather 
than  of  the  original  input  itself.  To  recover  s,  the  receiver  needs  to  either  perform 
another  modulo  operation  prior  to  detection,  or  to  use  a  sheer  based  on  a  modulo- 
extended  constellation,  as  shown  in  Fig.  3-3.  In  either  case,  the  receiver  may  make 
errors  it  would  not  have  had  the  original  inputs  been  sent  over  a  noninterfering  chan¬ 
nel  and  without  precoding.  For  example,  this  could  happen  if  the  symbol  was 
sent  and  the  noise  had  very  strong,  but  nonnegative,  real  and  imaginary  components. 
Therefore,  the  equivalent  noninterfering  channel  for  a  precoding  system  is  not  ad¬ 
ditive  white  Gaussian  noise,  but  rather  a  “modulo  noise”  channel  with  somewhat 
different  properties.  This  issue  was  studied  by  Wesel  and  Cioffi  in  [77]  for  precoding 
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(a)  Original  4-QAM  constellation  (b)  Modulo-extended  version,  with  A  outlined 
Figure  3-3:  Example  of  modulo-extended  constellation  for  4-QAM 


of  intersymbol  interference  channels.  Once  again,  we  will  see  in  Section  3.1.3  that 
these  differences  can  be  overcome. 

Since  m  is  not  known  a  priori,  the  precoding  will  not  actually  occur  in  the  order 
shown  in  Fig.  3-2,  but  rather  row-by-row  as  an  intertwined  linear  operation  (multipli¬ 
cation  by  H~^)  and  constellation  expansion  (the  addition  by  2(A-m).  The  recursive 
form  for  ISI  channels,  as  in  Fig.  3-1,  comes  about  using  a  matrix  factorization: 


if  1 


^  1  0  0  \  ^ 

h[l]  1  0 

V  0  ^[1]  1  / 


f 

1  0  0 

- 1 

o 

o 

] 

) 

h[l]  1  0 

0  1  0 

1 

[ 

0  0  1 

0  h[l]  1 

J 

L  0  0 

1  0  0 

0  1  0 

1 

o 

0  -h[l]  1 

0  0  1 

where  each  matrix  multiplication  represents  one  time  through  the  loop.  At  each 
stage,  the  next  element  of  m  is  determined  based  on  the  symbols  that  were  previously 
precoded.  Because  of  the  factorization  given  in  (3.3),  the  memory  only  needs  to  be 
as  long  as  the  channel  length. 

A  similar  row-by-row  procedure  applies  for  any  hnite-size,  lower-triangular  ma¬ 
trix  H  (with  non-zero  diagonal  entries).  The  transmitter  chooses  the  constellation 
expansion  parameters,  m,  such  that  H~^s  is  in  the  fundamental  region  A.  The 
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original  minimum-phase  and  stability  restriction  for  ISI  channels,  which  ensured  that 
the  process  remains  stable,  are  not  necessary  for  hnite-length  data  vectors.  If  the 
diagonal  entries  of  H  are  not  all  one,  their  value  may  be  factored  out  into  a  separate 
diagonal  matrix,  where  they  can  contribute  directly  to  the  SNR  of  the  channel. 

The  performance  of  the  TH  precoder  can  be  contrasted  with  that  of  a  purely  linear 
array  processor  that  also  eliminates  interference.  In  this  second  case,  the  transmitter 
could  simply  send  x  =  H^^s.  However,  the  transmitted  power,  assuming  an  i.i.d. 
input  vector  of  length  M,  becomes 

T[||a;|p]  =  trace  T[||s|p]  (3.4) 

Note  that  trace  { {H  is  the  sum  of  powers  of  the  elements  in  iT  In  other 

words,  with  channel  inversion,  all  of  the  elements  of  contribute  to  magnifying 
the  transmitted  energy,  while  in  precoding,  only  the  diagonal  elements  do  (again,  to 
hrst  order).  We  could  also  write  the  above  equation  (3.4)  as 

M  ^ 
n=l  ^ 

where  an{H)  are  the  singular  values  of  H .  This  shows  that  as  the  matrix  H  gets 
close  to  singular,  the  increase  in  energy  over  the  precoding  solution  can  become  very 
large. 

3.1.2  Precoding  Over  Arbitrary  Channel  Matrices 

We  now  concentrate  on  the  more  important  issue,  that  of  communicating  over  arbi¬ 
trary  matrix  channels.  In  Chapter  2,  we  discussed  linear  solutions  using  a  beamform¬ 
ing  matrix  G  to  diagonalize  the  channel.  However,  at  least  in  the  case  of  a  triangular 
matrix,  precoding  can  be  much  more  efficient.  Unfortunately,  in  the  precoding  system 
of  Fig.  3-2,  a  triangular  matrix  was  crucial  to  providing  an  ordered,  layered  structure 
to  the  interference.  For  arbitrary  channel  matrices,  we  therefore  follow  [7]  and  [30] 
in  proposing  a  two-step  solution,  to  hrst  convert  the  K  x  M  matrix  (where  K  <  M) 
into  a  triangular  channel,  and  then  apply  the  precoding  algorithm  of  the  previous 
section. 

This  two-step  solution  takes  the  form  of  a  matrix  factorization.  Instead  of  using  a 
single  G  matrix  to  remove  interference,  we  use  separate  beamforming  and  precoding 
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parts,  Gb  and  Gp,  such  that  the  combined  effective  channel  HGbGp  is  diagonal. 
Putting  this  all  together,  and  factoring  out  any  power  control  into  a  diagonal  matrix 
D,  we  organize  the  system  as  follows: 

Algorithm  1  (Zero-Forcing  Precoding)  Consider  transmission  from  an  M- element 
array  to  K  uncoordinated  receivers  (with  K  <  M)  with  a  given  matrix  of  fading  co¬ 
efficients  H .  A  precoding  solution  that  results  in  no  interference  is 

y  =  HGpGpDs w,  (3.5) 

where  HGpGp  is  designed  to  he  diagonal  and 

•  5  =  s  +  2A(f  ■  m  is  the  modulo- eguivalent  vector  of  symbols  to  be  transmitted. 
We  assume  the  constellation  is  chosen  so  that  the  transmitted  symbols  satisfy 
the  power  constraint. 

•  D  is  a  diagonal  matrix  with  diagonal  elements  dk  controlling  the  amplitudes 
sent  to  each  receiver.  We  apply  the  constraint 

5^141'  <  1 

k=l 

•  Gp  is  a  lower-triangular  matrix  describing  the  linear  part  of  the  precoding  op¬ 
eration.  The  diagonal  elements  of  Gp  are  all  1. 

•  Gb  is  the  beamforming  matrix  consisting  of  orthonormal  columns. 

The  Gb  and  Gp  matrices  can  be  easily  computed  using  the  H  =  LQ  lower- 
triangular  decomposition.  Let  Gp  =  and  Gp  be  a  scaled  version  of  so  that 
the  overall  product  is  diagonal. 

Actually,  there  will  usually  be  Gp  matrices  that  are  not  orthonormal  yet  still 
satisfy  the  other  criteria,  such  as  one  derived  from  the  LU  decomposition  for  a 
square  H  matrix.  However,  given  our  insistence  on  a  lower-triangular  Gp  and  zero 
interference,  an  orthonormal  Gp  is  sufficient  to  maximize  received  SNR. 

Furthermore,  an  orthonormal  Gp  makes  it  relatively  easy  to  find  a  scaling  factor  ( 
to  satisfy  the  power  constraint.  This  way,  the  beamforming  operation  leaves  the  total 
power  of  the  precoded  symbols  unchanged.  If  we  additionally  use  the  approximation 
that  precoding  adds  no  energy  (i.e.,  that  the  “precoding  power  loss”  is  negligible). 
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then  one  only  needs  to  select  a  constellation  such  that  s  satishes  the  power  constraint, 
without  worrying  about  the  precoding  and  beamforming  at  all. 

The  different  scalings  of  the  Gb  and  Gp  matrices  illustrate  how  precoding  is  more 
efficient  than  beamforming.  To  ensure  (to  hrst  order)  that  neither  one  changes  the 
power  of  the  symbol  vector,  each  column  of  Gp  is  scaled  so  that  the  diagonal  element 
is  unity,  while  Gp  must  be  scaled  down  further  so  that  the  entire  column  has  unit 
norm. 

Assuming  there  are  a  hnite  number  of  receivers,  then  stability  of  the  precoding 
system  is  only  in  doubt  if  H  does  not  have  full  row  rank,  i.e.,  if  the  receivers  have 
linearly  dependent  channel  vectors.  For  most  fading  models  and  M  >  K,  this  occurs 
with  probability  zero.  The  m*  coefficients  can  also  be  kept  below  some  threshold  by 
choosing  not  to  transmit  to  particularly  weak  receivers. 


3.1.3  Performance  of  Precoding 

Evaluating  the  performance  of  precoding  systems  is  complicated,  and  depends  on  the 
modulation,  coding,  and  other  signaling-level  implementations  that  are  used.  We 
begin  with  a  preliminary  discussion  on  “idealized”  performance,  and  later  describe 
how  to  deal  with  various  issues  that  cause  actual  performance  to  diverge  from  this. 


Idealized  Performance 

At  a  basic  level,  the  spatial  precoding  solution  outlined  above  changes  the  arbitrary 
H  matrix  into  a  diagonal  matrix,  HGpGp.  Thus,  if  we  treat  the  modulo  noise 
as  Gaussian  additive  noise  and  neglect  the  effect  of  the  precoding  power  loss,  what 
results  is  a  a  series  of  parallel  additive  noise  channels.  It  is  then  straightforward  to 
determine  the  SNRs  of  these  parallel  channels  in  terms  of  the  LQ  factorization  of 
H.  Recalling  that  we  set  the  beamforming  matrix  Gp  equal  to  Q\  the  diagonalized 
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channel  matrix  becomes 


HGbG^  =  HQ^Gp 

=  LGp 

h  0  ...  0 

0  k  ...  0 

.  .  .  ’ 

0  0  ...  Ik 

where  Ik  are  the  diagonal  entries  of  L.  The  last  equality  results  because  Gp  was 
specihcally  chosen  to  diagonalize  the  product,  and  furthermore  is  lower  triangular 
with  diagonal  entries  of  unity.  Including  the  power  control  d,  the  channel  to  the  kth 
receiver  takes  the  form 

yk  =  lkdkSk  +  Wk,  k  =  l,2,...,K,  (3.6) 

with  received  SNR  equal  to 

SNR,  =  (3.7) 

yvo 

We  will  say  that  the  system  does  not  use  power  control  if  all  of  the  d,  parameters  are 
chosen  equal  to  1/K. 

Similarly,  the  idealized  instantaneous  capacity  of  a  coded  link  to  the  kth  receiver 
becomes 

Ck  —  log2  (1  H  — 1  ,  (3.8) 

with  a  corresponding  ergodic  capacity  of 

C..ergodic  =  S  (l  +  .  (3.9) 

As  we  will  see  below,  these  rates  are  achievable  with  more  sophisticated  dirty-paper 
encoding  techniques,  supporting  our  use  of  (3.7)-(3.9)  as  performance  measures. 

The  transmitter  can  adjust  the  power  control  and  ordering  among  the  streams  to 
satisfy  particular  criteria  based  upon  individual-receiver  or  system-wide  goals,  outage 
or  ergodic  capacity.  We  will  say  more  about  these  choices,  and  propose  algorithms 
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appropriate  for  various  situations,  in  Section  3.2.2. 

To  compare  the  performance  of  precoding  with  other  methods,  sum  capacity 
strategies  provide  a  good  illustration  and  have  been  the  subject  of  most  research 
up  until  now  [30,  8,  82].  In  these  cases,  the  transmitter  should  use  a  waterhlling 
power  control  policy  over  the  parallel  channels  [14], 

where  A  is  chosen  to  satisfy  the  power  constraint.  Using  this  optimal  power  control 
and,  for  precoding,  the  max  sum  ordering  method  proposed  in  Section  3.2.2,  we 
show  in  Fig.  3-4  the  ergodic  sum  capacity  of  several  techniques  in  situations  where 
an  8-element  array  communicates  with  up  to  8  receivers.  In  these  simulations,  we 
assume  an  independent  Rayleigh  model  whereby  the  elements  of  the  channel  matrix 
H  are  i.i.d.  complex  Gaussian  variables.  With  eight  receivers,  precoding  achieves 
well  more  than  double  the  throughput  of  a  round-robin  timesharing  strategy.  Smarter 
scheduling,  as  in  [74],  improves  the  throughput  of  timesharing  only  slightly  compared 
with  precoding.  Linear  array  processing,  in  the  form  of  zero-forcing  beamforming, 
does  well  up  to  a  point,  but  eventually  degrades  as  the  transmitter  must  send  nulls 
to  too  many  receivers.  The  top  curve  represents  a  bound  on  performance,  showing 
the  highest  achievable  rate  when  the  receivers  can  coordinate  their  responses,  using 
Teletar’s  system  [63].  That  precoding  can  get  so  close,  at  least  at  the  selected  input 
SNR  level,  suggests  that  having  coordination  at  either  the  transmitter  or  receiver  side 
is  more  important  that  having  it  at  both  sides. 

Although  not  shown  in  the  hgure,  our  simulations  also  suggest  that  except  at  very 
low  SNR,  power  control  plays  only  a  secondary  in  maximizing  the  sum  capacity.  This 
is  in  line  with  results  for  communicating  over  parallel  channels  in  frequency  (see,  e.g., 
[12]).  We  will  also  see  in  Section  3.2.2  that  a  random  ordering  of  streams  causes  some 
loss  in  sum  capacity,  but  still  performs  well. 

More  care  must  be  taken  for  situations  where  the  individual-receiver  outage  is 
most  important,  because  precoding  often  results  in  performance  asymmetries  among 
the  various  streams.  This  results  from  the  triangularization  step  of  the  precoding 
algorithm  (3.5),  where  the  beamforming  matrix  G-q  must  steer  more  nulls  for  some 
streams  than  others.  Quantitatively,  this  is  evident  by  looking  at  probability  distri¬ 
butions  of  (from  the  idealized  SNR  (3.7))  over  the  random  ensemble  of  channel 


A/o 

pKF 
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Number  of  receivers 

Figure  3-4:  Ergodic  sum  capacity  for  spatial  precoding  compared  with  other  methods, 
when  transmitting  from  an  8-element  array  to  up  to  8  receivers.  Results  are  from 
simulations  assuming  independent  Rayleigh  fading  at  5  dB  SNR  per  link.  The  top 
curve  represents  a  upper  bound  based  on  coordinated  receivers. 

matrices  H.  For  our  independent  Rayleigh  model  with  an  M-element  array  and  ran¬ 
dom  ordering  of  K  streams,  we  apply  the  LQ  factorization  result  quoted  in  [18]  and 
see  that  will  have  a  X2{M-k+i)  distribution,  or  equivalently  an  Erlang  distribution 
with  M  —  k  +  1  degrees  of  freedom.  This  means  that  the  kth  receiver  has  the  same 
outage  performance  as  in  a  single-receiver  system  with  a  transmit  array  of  M  —  k  +  1 
elements,  if  we  correct  for  the  fact  that  the  it  only  gets  a  fraction  of  the  total 
transmitted  power.  This  still  compares  favorably  with  a  system  using  zero-forcing 
beamforming,  where  all  receivers  get  the  weakest  of  these  distributions  (Erlang  of 
order  M  —  K  +  1).  However,  increasing  the  effective  order  of  this  weakest  receiver’s 
distribution  by  only  one  or  two  could  mean  a  dramatic  improvement  in  outage  (recall 
Fig.  2-3).  Strategies  that  maximize  sum  capacity  tend  to  only  increase  the  asym¬ 
metry  among  receivers,  so  we  will  also  address  the  issue  of  providing  more  equitable 
performance  among  receivers  in  Section  3.2.2. 
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Approaching  the  Idealized  Performance 

A  straightforward  TH  precoding  implementation  will  differ  from  the  idealized  perfor¬ 
mance  for  a  number  of  reasons,  which  we  have  touched  upon  but  summarize  here: 

•  Modulo  noise:  The  modulo  operations  modify  the  noise  distribution  from  ad¬ 
ditive  Gaussian  noise  into  a  modulo-noise  channel.  The  constellation  symbols 
adjacent  to  the  boundary  of  the  fundamental  region  A  will  then  have  smaller 
decision  regions,  increasing  the  probability  of  error. 

•  Precoding  power  loss:  The  original  symbols,  s,  are  usually  chosen  from  a  discrete 
set  of  points  within  A,  while  the  precoded  symbols  GpDs  will  have  a  more 
uniform  distribution  over  their  fundamental  regions.  In  most  cases,  this  causes 
an  increase  in  transmitted  power. 

•  Shaping  gain:  With  a  QAM  constellation,  the  constellation  points  are  dis¬ 
tributed  over  a  Cartesian  product  of  square  regions  A.  However,  for  maxi¬ 
mum  power  efficiency,  the  transmitted  symbols  should  instead  be  distributed 
over  a  higher-dimensional  sphere  [44].  The  difference  in  transmitted  power  is 
quantified  by  a  “shaping  gain”  that  must  be  bridged  to  achieve  optimal  perfor¬ 
mance.  The  maximum  shaping  gain  occurs  at  high  SNR,  where  it  is  equal  to 
log2(7re/6)  =  0.51  bits  per  two  dimensions. 

These  factors  vary  in  importance  depending  upon  the  regime  of  operation,  and 
can  be  addressed  in  different  ways.  For  example,  at  high  SNR,  the  first  two  issues 
become  negligible  and  only  a  shaping  loss  remains.  One  can  then  adapt  shaping  tech¬ 
niques  that  were  previously  used  for  the  intersymbol  interference  channel.  Although 
the  ISI  coder  of  [42]  is  not  appropriate  for  layered  interference  across  streams,  an 
alternate  method,  trellis  precoding  [24],  was  implemented  by  Yu  and  Ciofh  [83]  and 
shown  to  achieve  reasonable  shaping  gains.  For  low-rate  precoding,  we  introduce  in 
Section  3.2.3  a  method  for  reducing  the  precoding  power  loss  in  certain  situations 
with  structured  interference. 

Another  option,  potentially  more  complex  but  offering  a  more  unified  approach, 
is  to  apply  recent  methods  from  the  information  embedding  community  that  are 
essentially  implementations  of  Costa’s  dirty-paper  encoding  [13].  These  are  reviewed 
in  [84]  and  include  quantization  index  modulation  and  nested  lattices  [10,  4,  20].  The 
remainder  of  this  subsection  will  be  a  brief  overview  on  how  they  apply  to  spatial 
precoding,  a  connection  first  made  by  Caire  and  Shamai  [7]. 
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Consider  the  transmission  of  a  symbol  Sk  with  interference  b  caused  by  previously- 
precoded  streams.  The  precoder  uses  a  lattice  code  consisting  of  a  “coarse”  sublattice 
and  translates  of  this  sublattice  called  cosets.  For  example,  in  the  modulo-extended 
constellation  of  Fig.  3-3,  a  coset  consists  of  all  symbol  points  of  one  type,  such  as 
As  shown  in  Fig.  3-5,  the  embedding  “quantizes”  the  interference  signal  to  the 
nearest  point  in  the  coset  selected  by  Sk-  The  transmitter  then  sends  e,  the  difference 
between  this  quantization  point  and  the  expected  interference. 

Although  the  preceding  example  is  just  another  description  of  TH  precoding, 
we  now  add  several  several  elements  to  approach  the  idealized  performance.  If  the 
distribution  of  the  precoded  symbol,  e,  is  not  already  approximately  uniform  over 
the  Voronoi  region  of  the  coarse  lattice  (the  dashed  box  of  Fig.  3-5b,  but  shifted 
to  be  have  zero  mean),  then  the  transmitter  can  add  a  pseudonoise  dither  signal, 
known  at  both  transmitter  and  receiver,  to  the  entire  lattice  prior  to  embedding. 
The  average  transmitted  power  can  now  be  easily  computed  from  the  Voronoi  region. 
Next,  more  efficient  transmission  is  made  possible  by  coalescing  several  time  instances 
of  the  embedding  problem  together  and  doing  vector  quantization.  The  use  of  good, 
higher-dimensional  nested  lattices  simultaneously  provides  coding  gain  (by  increasing 
the  minimum  distance  in  the  hne  lattice)  and  shaping  gain  (by  making  the  Voronoi 
region  of  the  coarse  lattice  more  like  a  higher-dimensional  sphere).  Finally,  the  modulo 
noise  and  precoding  power  loss  are  overcome  by  a  technique  known  as  noise  cooling 
or  distortion  compensation,  which  requires  a  few  more  words. 

Recall  that  in  many  estimation  problems,  mean-square  error  can  be  improved  by 
intentionally  leaving  in  some  interference.  Similarly,  the  sheer  error  for  precoding 
systems  can  be  improved  by  shifting  the  balance  between  noise  and  interference.  As 
described  in  [10]  and  [84],  the  encoder  multiplies  the  interference  it  expects  by  a  real 
constant  a  (less  than  or  equal  to  one)  before  quantizing  to  it,  therefore  sending 

e  =  Sk  -  ab, 

where  again  Sk  is  the  modulo-equivalent  message  symbol.  The  receiver  then  multiplies 
its  signal  by  a  before  slicing,  producing 

ai/k  =  Q;(e  +  b  +  w) 

=  Sk  +  [aw  —  (1  —  a)e] .  (3.10) 
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(a)  Quantization  to  coset  (b)  Region  of  possible  interference  signals 

that  would  get  quantized  to  the  same  point 

Figure  3-5:  Embedding  of  the  symbol  “00”  for  the  example  of  Fig.  3-3. 


Now,  the  noise  power  has  been  reduced  to  at  the  expense  of  a  new  self-noise 

term  of  power  (1  —  a^Vk,  where  Vk  is  the  average  transmitted  power  of  this  stream. 
With  optimal  lattices  in  higher  dimensions,  this  self-noise  behaves  as  i.i.d.  Gaussian 
noise,  independent  of  the  other  terms  in  (3.10).  The  a  for  receiver  k  that  maximizes 
the  overall  received  signal  to  noise  ratio  is 


Vk 

Vk+ 


and  increases  this  received  SNR  by  one.  In  this  way,  the  system  achieves  the  idealized 
SNR  of  Vk/Mf).  Distortion  compensation  is  incorporated  into  our  matrix  formulation 
by  multiplying  the  off-diagonal  elements  of  row  k  of  the  precoding  matrix  Gp  by 
For  example. 


1  0 
Gp  ;  G2,i  1 
G^3,1  1^3,2 


0  1 

0  - >  Q;2G2,i 

1  «3G'3,1 


0 

1 

«3l^3,2 


Receiver  k  then  just  needs  to  multiply  its  input  yk  by  ak- 

Whether  a  system  chooses  to  implement  nested  lattices  and  distortion  compensa¬ 
tion  or  the  more  simple  TH  precoding  with  shaping  depends  on  the  potential  benehts 
and  complexity.  At  high  SNR,  the  modulo  noise  and  precoding  power  loss  disappear. 


57 


making  distortion  compensation  becomes  unnecessary;  a  just  degenerates  to  one.  At 
lower  SNR,  high-dimensional  lattices  with  well-shaped  Voronoi  regions  are  necessary 
to  make  the  self-noise  term  look  Gaussian.  This  can  lead  to  higher  complexity  and 
decoding  delays.  In  Section  3.2.3,  we  will  explore  a  different  method  of  reducing  the 
precoding  power  loss  that  is  applicable  to  a  few  particular  interference  distributions 
that  may  come  up  in  spatial  precoding. 

3.1.4  Improving  Upon  Zero-Forcing  Precoding 

In  our  matrix  factorization  approach  to  precoding,  we  assumed  that  the  transmitter 
wished  to  create  a  diagonal  effective  channel,  causing  no  interference.  It  has  been 
recently  shown,  for  the  two-receiver  case  by  Caire  and  Shamai  [7]  and  for  the  general 
case  by  several  authors  [82,  71,  75],  that  the  sum  rate  can  be  improved  somewhat  by 
allowing  some  amount  of  interference,  and  that  this  form  exactly  achieves  the  sum 
capacity  of  the  channel.  It  is  not  known  whether  modihcations  of  this  solution  can 
achieve  the  entire  achievable  region  of  rate  i7-tuples. 

We  will  continue  on  with  the  zero-forcing  (that  is,  no  interference)  version,  for 
a  number  of  reasons.  First  of  all,  it  leads  to  easier  computation  and  analysis;  as  of 
now,  there  is  no  known  closed-form  expression  or  provably  convergent  optimal  iter¬ 
ative  algorithm  to  compute  the  more  general  precoder.  This  is  especially  important 
because  we  are  interested  in  the  distribution  of  performance  among  receivers,  not 
only  the  maximum  sum  rate  operating  point.  Secondly,  by  considering  precoding  and 
the  receiver-cooperation  bound  in  Fig.  3-4  and  other  examples,  it  appears  that  the 
great  majority  of  the  beneht  of  multiple  antenna  elements  and  multiple  receivers  is 
attainable  by  zero-forcing  precoding,  at  least  in  this  SNR  regime.  Furthermore,  Caire 
and  Shamai  showed  that  in  the  limits  of  high  SNR  (where  interference  matters  more 
than  any  additive  noise)  and  low  SNR  (where  only  one  of  the  streams  is  sent  with 
nonzero  power),  zero- forcing  precoding  also  achieves  the  maximum  sum  rate. 

3.1.5  Relation  to  Other  Matrix  Channel  Problems 

Our  matrix  factorization  description  applies  not  only  to  precoding  systems,  but  also 
to  a  variety  of  other  scenarios  involving  multiple  antenna  elements  at  both  the  trans¬ 
mitter  and  receiver  sides.  We  will  see  that  a  wide  variety  of  algorithms  can  be 
incorporated  under  this  common  framework.  This  process  helps  to  categorize  results 
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from  the  literature,  place  precoding  within  this  larger  context,  and  perhaps  lead  to 
new  algorithms  within  this  family. 

The  different  channel  scenarios  under  consideration  all  share  the  same  mathemat¬ 
ical  channel  model  (3.1)  but  vary  depending  on  whether  there  is  coordination  at  the 
transmitter  side,  receiver  side,  or  both.  So  far,  we  have  looked  exclusively  at  the  case 
where  only  the  transmitter  elements  can  coordinate.  The  opposite  might  be  true  for 
the  uplink  direction,  and  a  formal  duality  between  this  so-called  multiple  access  chan¬ 
nel  and  precoding  has  been  recently  shown  [75,  71].  Below,  we  demonstrate  how  these 
and  other  techniques  can  be  subsumed  under  the  idea  of  diagonalizing  the  channel 
matrix  using  factors  corresponding  to  two  types  of  operations: 

•  Linear:  Simple  matrix  multiplication,  i.e.,  beamforming 

•  Interference  cancellation:  Intertwined  matrix  multiplication  and  nonlinear  in¬ 
terference  subtraction 

For  example,  in  precoding,  the  transmitter  performs  an  LQ  factorization  of  H,  where 
the  Q  operation  is  of  the  hrst  type  and  L  is  of  the  more  efficient,  second  type. 

Receiver-Side  Coordination 

Consider  a  situation  where  a  transmitter  sends  an  independent  data  stream  from  each 
antenna  element  to  a  coordinated  receiver  array.  If  there  are  at  least  as  many  receivers 
as  transmit  antenna  elements,  then  with  probability  one,  the  receivers  could  remove 
interference  using  a  single-step  linear  operation.  This  receiver-based  beamforming 
takes  the  form  of  a  left  multiplication  by  a  matrix  Gb, 

GbV  =  GbHs  +  GbW,  (3-11) 

such  that  GbH  is  diagonal.  A  more  efficient  method,  however,  is  to  only  triangularize 
the  channel  in  this  way,  then  use  a  hnal  interference  cancellation  step.  With  analogy 
to  precoding,  the  receiver  detects  and  decodes  the  streams  in  the  order  implied  by  the 
triangularization,  and  at  each  step  subtracts  off  the  interference  caused  by  previously- 
detected  streams. 

This  two-step  receiver  has  appeared  in  the  literature  in  different  contexts  and  un¬ 
der  many  names.  For  intersymbol  interference  channels,  it  is  known  as  the  decision- 
feedback  equalizer;  in  multiple  antenna-element  wireless,  the  V-BLAST  system  [26]; 
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and  in  CDMA  (where  H  represents  the  spreading  sequences),  successive  interference 
cancellation  [17].  As  with  precoding,  the  higher  efficiency  of  the  interference  cancel¬ 
lation  operation  can  be  seen  by  considering  the  different  constraints  on  two  matrix 
factors.  Two  avoid  noise  enhancement,  each  row  of  Gb  should  have  unit  norm,  while 
the  interference  cancellation  factor  has  unit  diagonal  elements  (since  determining  and 
subtracting  off  interference  does  not  enhance  the  noise).  Under  the  idealized  as¬ 
sumption  that  previously-ordered  streams  are  detected  perfectly,  the  received  SNRs 
and  maximum  coded  rates  of  the  streams  correspond  to  those  of  precoding,  with  the 
modihcation  that  we  now  perform  an  LQ  factorization  of  rather  than  of  H.  Sim¬ 
ilarly,  one  can  achieve  the  sum  capacity  by  not  requiring  the  linear  factors  to  strictly 
triangularize  the  channel  matrix  [3]. 

The  duality  between  precoding  and  receiver-side  interference  cancellation  is  ap¬ 
parent,  and  a  choice  between  the  two  methods  depends  on  where  the  burden  of  com¬ 
putation  and  coordination  should  he  within  a  system.  There  are  important  practical 
distinctions  as  well.  For  instance,  a  precoding  system  may  fall  short  of  the  achievable 
performance  by  not  using  perfect  dirty-paper  encoding,  while  receiver-based  array 
processing  can  fail  if  there  is  too  much  error  propagation  from  previously-detected 
streams. 


Coordination  at  Both  Sides 

When  a  system  has  coordination  at  both  transmitter  and  receiver  arrays,  new  pos¬ 
sibilities  open  up.  In  addition  to  all  the  previous  strategies,  one  could  use  an  LQ 
decomposition  to  perform  beamforming  at  the  transmitter  and  interference  cancella¬ 
tion  at  the  receiver.  This  is  applicable  for  K  <  M  and  achieves  the  same  SNR  or 
sum  rate  performance  as  precoding,  but  with  the  tradeoffs  associated  with  receiver 
interference  cancellation  (such  as  having  to  deal  with  error  propagation,  but  not  extra 
modulo  operations).  More  interesting,  though,  are  different  types  of  factorizations. 

Interestingly,  Teletar  [63]  showed  that  the  sum  rate  is  maximized  by  splitting  the 
processing  with  a  singular  value  decomposition, 

H  =  U'EV\ 

where  U  and  V  are  unitary  and  S  is  diagonal  with  nonnegative  entries.  Transmit 
beamforming  is  done  with  V  and  receiver  beamforming  with  A  geometric  inter- 
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pretation  would  be  that  we  now  transmit  along  principal  directions  rather  than  using 
a  Gram-Schmidt  (i.e.,  LQ)  decomposition. 


Another  interesting  choice  would  be  to  perform  precoding  at  the  transmitter  (to 
remove  interference  from  earlier  streams)  and  interference  cancellation  at  the  receiver 
(working  in  the  opposite  direction,  to  remove  interference  from  later  streams).  This 
frees  the  beamforming  to  do  single-user  matched  filtering.  The  precoding  and  inter¬ 
ference  cancellation  operations  amount  to  performing  a  Cholesky  LL"'  decomposition 
on  HH^ .  To  be  more  specihc,  we  assume  i.i.d.  precoded  symbols  and  let  the  beam¬ 
forming  matrix  be 


trace  {if iff } 


which  will  then  satisfy  the  power  constraint.  Next,  the  precoder  must  make  the 
effective  channel  matrix  HGb  triangular  so  that  the  receiver’s  interference  cancellor 
can  do  its  job.  Since  the  precoding  matrix  itself  must  be  triangular,  the  Cholesky 
decomposition  is  natural.  The  precoding  matrix  will  be  (T^)  but  scaled  such  that 
the  diagonal  entries  are  one.  It  turns  out  that  this  L  is  the  same  matrix  as  the  L 
from  the  LQ  decomposition  of  if.  When  all  of  this  is  done,  the  final  received  SNRs 
apparently  become 


SNRfc 


V\lk\^  1 

^fo  '  trace  {HH^} 

(SNRfc  from  precoding)  ■ 


_ \h^ 

J_  II 

K  2^i=l  II 


This  can  lead  to  some  interesting  SNR  distributions,  increasing  the  performance  of 
the  receivers  with  better  channels.  However,  there  are  two  main  deficiencies  of  this 
method.  First,  the  perfect  information  embedding  that  was  assumed  in  the  SNR 
computation  above  is  not  achievable,  since  distortion  compensation  (which  involves 
changing  the  off-diagonal  entries  of  the  precoding  matrix)  will  affect  the  signal  power 
of  each  stream.  Secondly,  power  control  is  more  difficult  to  do  because  the  beam¬ 
forming  matrix  is  not  orthogonal. 

Table  3.1  presents  a  summary  of  several  of  the  different  methods  discussed  here. 
Recall  that  from  Fig.  3-4,  coordination  at  only  one  side  may  actually  achieve  close  to 
the  same  performance  as  coordination  at  both  sides. 
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No  coordination  among  Rx 

Coordination  among  Rx 

No  coordination 

N/A 

Beamforming  alone 

K  -  M  +  1 

at  Tx 

V-BLAST  LQ  of 

K  -  M  +  i 

Coordination 

Beamforming  alone 

M -K  +  l 

Hybrid  LQ  oi  H 

M -K  +  i 

at  Tx 

Precoding  —  LQ  of  H 

M  -  K  +  i 

SVD  H  =  ui:v^ 

Maximizes  snm  capacity 

Table  3.1:  Summary  of  array  processing  algorithms  for  a  variety  of  scenarios  with 
M  transmit  antenna  elements  and  K  receiver  antenna  elements.  Shown  are  the 
corresponding  matrix  factorizations,  as  well  as  a  measnre  of  performance  in  terms  of 
the  order  of  the  Erlang  distribntion  of  idealized  SNR  for  the  iih.  stream.  “Hybrid” 
refers  to  beamforming  at  the  transmitter  and  interference  cancellation  at  the  receiver 
side. 
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3.2  Implementation  Issues 

3.2.1  Overview  of  New  Implementation  Issues 

Although  the  methods  already  discussed  provide  a  theoretical  basis  for  precoding, 
much  work  remains  in  the  design  of  practical  precoding  solutions.  For  example,  when 
adapting  techniques  derived  from  the  information  embedding  literature,  one  must  be 
aware  of  the  different  contexts  in  which  the  two  problems  come  up.  We  summarize 
some  of  the  key  issues  below: 

•  When  information  is  sent  to  more  than  two  receivers,  a  stream  can  be  part  of 
both  an  embedding  and  several  hosts.  This  allows  the  transmitter  to  rearrange 
the  ordering  of  streams  to  achieve  different  performance  tradeoffs.  Additionally, 
it  may  divide  up  the  available  power  in  a  number  of  ways.  We  discuss  these 
issues  in  more  detail  in  Section  3.2.2,  finding  that  the  ordering  and  power  control 
can  play  an  important  role. 

•  The  would-be  interference  is  not  some  arbitrary  signal,  but  rather  a  linear  com¬ 
bination  of  symbols  from  previonsly  precoded  streams.  Therefore,  it  may  have 
certain  properties,  snch  as  a  particnlar  discrete  distribntion,  that  the  precoder 
may  be  able  to  exploit.  We  look  at  precoding  for  some  of  these  sitnations  in 
Section  3.2.3. 

•  Precoding  and  information  embedding  often  operate  in  different  regimes  dne  to 
the  goals  and  constraints  of  their  respective  problems.  Many  times  in  embed¬ 
ding  applications,  one  wishes  to  hide  a  small  amonnt  of  information  withont  a 
noticeable  degradation  in  the  host  signal.  To  satisfy  this  maximnm  distortion 
constraint,  embedding  rates  tend  to  be  smaller  than  one  bit  per  host  dimension. 
By  contrast,  zero-forcing  precoding  does  not  cause  any  distortion  in  the  earlier 
streams.  Instead,  we  have  a  power  constraint,  which  is  often  mnch  larger  and 
allows  higher-rate  transmission.  This  can  lead  to  different  types  of  modulation 
and  encoding  techniqnes  and,  as  we  have  seen,  different  nonidealities  in  the 
precoding  process  itself  that  must  be  considered. 

•  To  achieve  high  data  rates  for  wireless  applications,  complexity  can  become  a 
major  issne.  Ideally,  both  the  transmitter  and  receiver  shonld  perform  only 
simple  operations.  If  the  receivers  are  battery-operated,  complexity  there  be- 
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comes  even  more  important.  Simple  embeddings  may  be  preferable  over  nested 
lattices. 

•  Because  the  precoder’s  “host”  signal  involves  interaction  with  a  channel  ma¬ 
trix,  the  algorithm  depends  critically  on  the  transmitter  having  knowledge  of 
these  channel  characteristics.  Similarly,  the  receiver  must  have  some  informa¬ 
tion  about  the  channel  and  the  encoding.  We  discuss  some  of  these  issues  in 
Section  3.2.4. 

In  the  next  few  sections,  we  look  at  several  system  components  that  address  one 
or  more  of  these  issues. 

3.2.2  Ordering  of  Streams 

The  order  in  which  the  streams  are  precoded  will  have  an  effect  on  their  associated 
receivers’  performance.  This  suggests  the  need  for  practical  algorithms  that  match  the 
ordering  to  specific  performance  goals.  In  this  section,  we  concentrate  on  optimizing 
according  to  two  basic  criteria,  sum  capacity  and  individual-receiver  outage. 

Several  authors  have  recognized  the  importance  of  this  ordering,  but  so  far  detailed 
analysis  and  algorithms  have  been  lacking.  Caire  and  Shamai  [7,  8]  discussed  this 
issue  (for  both  zero-forcing  and  more  general  precoding)  and  stated  the  solution 
for  two-receiver  sum  capacity.  For  larger  numbers  of  receivers,  Yu  and  Cioffi  [82] 
proposed  an  iterative  algorithm  for  approaching  the  sum  capacity,  but  were  not  able 
to  prove  convergence  nor  a  closed-form  solution.  They  and  others  [71,  75]  also  discuss 
a  rate  region  that  encompasses  all  precoding  solutions,  but  do  not  offer  any  additional 
algorithms  for  reaching  specihc  operating  points  of  interest.  None  of  these  works 
directly  deal  with  optimizing  single-receiver  outage. 

Consider  hrst  a  random  ordering  of  K  streams.  When  the  later  streams  are 
beamformed  to  avoid  interference  to  earlier  ones,  they  incur  a  loss  in  channel  quality. 
As  we  have  seen,  the  hrst  receiver  gets  the  full  Mth-order  diversity,  the  second  receiver 
M—  1,  and  the  Kih.  receiver  M  —  K+1.  If  we  wish  to  transmit  to  each  receiver  reliably 
at  a  constant  rate,  however,  we  would  prefer  greater  symmetry  among  these  receivers. 
On  the  other  hand,  to  maximize  throughput  (the  sum  capacity  across  all  receivers), 
it  will  turn  out  that  an  asymmetrical  distribution  is  better.  In  either  case,  the  SNR 
distribution  resulting  from  a  stream  ordering  can  be  augmented  by  appropriate  power 
control. 
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One  basic  constraint  for  all  possible  orderings  can  be  stated  as  follows: 

Proposition  1  Before  power  control,  the  product  of  received  SNRs  is  independent  of 
the  ordering. 

For  a  full-rank,  square  H,  this  is  equivalent  to  saying  that  the  square  magnitude  of 
the  determinant  of  L  from  the  LQ  decomposition  is  independent  of  the  ordering. 
This  is  true  because,  using  E  to  specify  the  permutation  matrix. 
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regardless  of  the  permutation  E.  The  last  equality  follows  because  the  determinant 
of  a  unitary  matrix  or  permutation  matrix  has  magnitude  1.  For  H  not  square,  we 
can  create  a  square  matrix  with  the  same  product  of  singular  values  by  adding  extra 
rows  that  are  orthogonal  to  each  other  and  the  other  rows  of  H.  For  H  not  full-rank, 
the  product  is  always  zero.  A  corollary  from  this  proof  is  that  this  product  of  SNRs  is 
also  equal  to  the  product  of  the  square  magnitudes  of  singular  values  of  which 

are  the  SNRs  of  the  parallel  channels  used  in  the  Teletar  scheme  (also  before  power 
control)  where  the  receivers  can  cooperate.  □ 


Proposition  2  Power  control  can  only  decrease  the  product  of  received  SNRs. 


Say  that  the  SNRs  before  power  control  (that  is,  sending  an  equal  fraction  of  power 
1/K  to  each  receiver)  are  I3k/K,k  =  l,2,...,iF.  We  then  use  a  different  power 
distribution  to  achieve  the  SNRs  \djff‘(3k,  where  MfcP  =  1-  Instead  of  taking  the 

product  of  SNRs,  we  can  look  at  the  monotonic  function  1/K  times  the  logarithm  of 
this  number.  Before  power  control,  we  get 

\fc=i  /  1.^1 

while  after  power  control, 

^  log  I  n  MfcPA  )  =  ^  log  (MfeT)  +  log(/^^)- 

\k=l  J  k=l 
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Since  the  mean  of  the  |(ifcp’s  is  1/K,  we  can  invoke  Jensen’s  equality  to  show  that 
the  hrst  computation  is  at  least  as  large  as  the  second.  □ 

We  now  go  on  to  study  two  performance  criteria  in  more  detail. 

Maximizing  Sum  Capacity 

The  goal  here  is  to  maximize  the  sum  capacity  to  all  receivers.  The  algorithm  for 
a  particular  channel  realization  will  be  the  same  whether  we  consider  instantaneous 
capacity  or  ergodic  capacity,  because  in  the  latter  case,  one  only  hits  a  maximum  by 
optimizing  the  sum  rate  over  each  realization.  Furthermore,  if  the  channels  are  all 
i.i.d.  and  vary  ergodically  over  time,  then  maximizing  the  sum  capacity  at  each  time 
will  also  result  in  each  receiver  achieving  the  same  average  rate,  thus  also  achieving 
a  degree  of  “fairness.” 

As  discussed  earlier,  for  a  particular  channel  realization  and  ideal  embedding,  the 
set  of  streams  are  effectively  sent  to  their  associated  receivers  through  K  parallel 
channels  with  rates  log2(l  +  SNR^),  where  SNR^  is  the  received  SNR  for  stream 
k.  This  SNR  is  determined  from  the  LQ  decomposition  associated  with  a  particular 
ordering  of  the  rows  of  H.  Since  in  this  chapter  each  stream  has  only  a  single  receiver, 
we  interchangeably  talk  about  ordering  streams  or  receivers.  To  maximize  the  sum 
rate,  one  must  also  use  power  control  to  waterhll  across  the  different  streams. 

Some  guidelines  on  ordering  streams  follow. 

Rule  1  (sum  capacity)  For  two  streams  (K  =  2),  the  one  whose  receiver  has  the 
larger  SNR  should  he  first. 

This  was  stated  in  [7] ,  and  a  proof  is  given  here  in  Appendix  A.  The  same  result  holds 
when  waterhlling  is  not  used,  and  can  be  shown  with  a  simple  convexity  argument. 

When  there  are  more  than  two  streams,  the  optimal  ordering  is  still  unknown. 
However,  one  rule  that  must  be  followed  is: 

Rule  2  (sum  capacity)  For  K  >  2  consider  any  two  consecutive  streams,  indexed 
k  and  k+  1.  When  projected  away  from  the  first  k  —  1  receivers’  channel  vectors,  the 
stream  with  the  larger  SNR  of  the  two  should  he  first,  i.e.,  given  index  k. 

The  ordering  of  the  two  streams  under  consideration  will  not  affect  the  SNRs  of  the 
other  K  —  2  receivers.  Applying  the  two-stream  result,  we  can  say  that  for  every 


66 


possible  way  of  splitting  the  available  power  between  these  two  and  all  the  others, 
waterhlling  within  each  grouping  will  result  in  a  higher  sum  rate  with  the  ordering 
implied  above  rather  than  the  reverse.  By  similar  reasoning,  this  rule  also  holds  when 
power  control  is  not  used. 

This  rule  alone  does  not  imply  a  unique  ordering,  however.  The  following  “greedy” 
algorithm  will  satisfy  the  rules  above: 

Algorithm  2  (sum  capacity)  Choose  the  reeeiver  with  the  strongest  overall  chan¬ 
nel  to  he  receiver  1.  Then,  project  all  other  channel  vectors  away  from  this  direction. 
Choose  the  strongest  one  of  these  as  receiver  2,  and  project  all  remaining  channel  vec¬ 
tors  away  from  both  receivers  1  and  2.  Choose  the  strongest  one  of  these  as  receiver 
3,  etc. 

The  Matlab  command  [Q,R,E]  =  qr (A ’)  will  produce  this  ordering.  This  was  used, 
along  with  waterhlling,  to  produce  the  precoding  curve  in  Fig.  3-4.  Although  this 
algorithm  satishes  the  rules  given  above,  it  is  not  always  optimal.  For  example,  let 
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Leaving  the  streams  in  the  given  ordering  is  optimal  for  sum  capacity,  although  our 
proposed  algorithm  would  do  otherwise. 

To  test  the  signihcance  of  the  stream  ordering,  we  plot  simulated  ergodic  sum 
capacities  in  Fig.  3-6  for  several  possible  algorithms.  The  proposed  algorithm  gains 
around  1  bit  per  channel  use  over  a  random  ordering  throughout  most  of  the  given 
range  of  SNRs.  Apparently,  concentrating  higher  performance  toward  a  small  number 
of  receivers  can  have  an  impact.  Along  these  lines,  a  lower-complexity  approximation 
to  this  algorithm  would  be  to  simply  order  the  receivers  by  their  channel  strengths, 
without  regard  to  the  interdependencies.  For  the  simulation,  this  led  to  almost  the 
same  performance  as  the  original  algorithm.  Reversing  the  order,  from  weakest  to 
strongest  (but  still  using  waterhlling),  causes  a  loss  of  up  to  an  additional  bit  over  the 
random  ordering.  Throughout,  this  simulation  assumed  that  the  receivers  have  iden¬ 
tically  distributed  Rayleigh  channel  coefficients.  The  effect  of  ordering  on  capacity 
will  be  even  greater  if  some  receivers  have  stronger  channels  than  others. 
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Figure  3-6:  Sum  capacities,  from  simulations,  for  various  stream  orderings  with  M  =  8 
transmit  antenna  elements  and  K  =  8  receivers.  Waterfilling  is  used. 


Minimizing  Individnal-Receiver  Outage 


The  sum  capacity  strategy  potentially  sacrihces  the  performance  of  some  receivers 
in  favor  of  the  “greater  good.”  This  is  hne  as  long  as  sum  capacity  is  of  primary 
importance,  or  if  the  channel  varies  ergodically  and  receivers  can  tolerate  performance 
fluxuations.  In  other  situations,  however,  it  may  be  more  important  for  individual 
receivers  to  maintain  strong  rates  through  (almost)  all  channel  realizations.  This 
might  be  true  in  a  non-adaptive  uncoded  system  with  constant-rate  transmission,  or 
if  a  strong  sense  of  “fairness”  across  receivers  is  most  important,  or  if  the  channel 
varies  extremely  slowly  with  time. 

Minimizing  outage  calls  for  a  more  conservative  strategy  that  maximizes  the  per¬ 
formance  of  receivers  with  the  weakest  channel  vectors.  Ideally,  all  receivers  would 
achieve  the  same  SNR,  which  from  Propositions  1  and  2  would  reach  its  maximum 
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that  is,  the  geometric  mean  of  the  SNRs.  However,  this  is  not  always  possible,  so  we 
instead  employ  a  kind  of  max-min  strategy.  Afterward,  power  control  can  be  used  to 
equalize  the  SNRs  at  the  receivers,  though  at  a  lower  level  than  the  ideal  of  (3.12) 
(see  Proposition  2). 

To  be  more  precise,  this  max  min  criterion  says  that  the  weakest  performance 
should  be  maximized,  and  then,  given  this,  the  second  weakest  should  be  maximized, 
etc.  Note  that  to  hnd  the  ordering,  it  does  not  matter  whether  we  consider  SNR  or 
log2(l  +  SNR). 

Once  again,  the  exact  ordering  is  unknown  for  K  streams,  but  some  insights  can 
be  developed: 


Rule  3  (max-min)  For  two  streams  (K  =  2),  project  each  channel  vector  away 
from  the  other.  The  receiver  with  the  weaker  result  should  go  first. 


Note  that  the  hrst  receiver  could  still  be  the  weaker  of  the  two,  even  though  it  no 
longer  has  to  project  away  from  the  second.  In  either  case,  the  result  is  worse  if  the 
ordering  is  reversed. 

Rule  4  (max-min)  For  K  >2  consider  any  two  consecutive  streams,  indexed  k  and 
k  +  1.  Project  each  channel  vector  away  from  the  other  and  those  of  streams  1  through 
k  —  1.  The  receiver  with  the  weaker  result  should  he  first,  i.e.,  given  index  k. 

This  follows  from  the  two-stream  case  because  all  other  streams  are  unaffected. 
This  also  suggests  a  “greedy”  algorithm,  which  obeys  the  rule  above  but  which  may 
not  necessarily  be  optimal: 

Algorithm  3  (max-min)  Project  all  receivers’  channel  vectors  away  from  every 
other.  Choose  as  the  last  receiver  the  one  with  the  strongest  result.  Next,  project 
all  remaining  channel  vectors  away  from  each  other.  Choose  as  the  second-to-last 
receiver  the  one  with  the  strongest  result,  etc. 


This  algorithm  produces  some  interesting  results  when  followed  by  power-control 
that  equalizes  all  the  received  SNRs.  Shown  in  Fig.  3-7  and  Fig.  3-8  are  simulated 
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Figure  3-7:  Outage  probability,  from  simulations,  for  precoding  of  K  =  8  streams 
from  an  M  =  8  element  array  using  various  ordering  methods.  We  use  power  control 
at  an  input  SNR  per  link  of  5  dB. 

SNR  outage  distributions  for  an  8-element  array  and  8  and  7  streams,  respectively. 
In  the  second  hgure,  which  achieves  a  higher  sum  rate,  the  proposed  algorithm  has 
close  to  the  same  distribution  as  the  ideal  (but  perhaps  unattainable)  goal  of  (3.12). 
For  comparison,  the  ordering  proposed  to  maximize  sum  capacity  loses  about  3  dB  at 
10%  outage  and  6  dB  at  1%  outage.  Even  greater  gains  are  exhibited  with  8  streams. 
As  the  number  of  streams  is  decreased  from  7,  the  effect  of  the  ordering  will  become 
less  significant. 

Through  most  of  these  cases,  the  outage  curves  from  our  algorithm  are  even  more 
steep  than  for  a  single  stream  with  8-level  diversity  (not  shown).  This  can  be  explained 
by  a  kind  of  averaging  effect  across  the  different  receivers’  channel  qualities.  Of  course, 
maximizing  the  diversity  in  this  way  comes  at  some  price  in  overall  throughput.  For 
the  system  shown,  the  maximum  sum  capacity  (reached  at  6  streams)  is  10.6  bits 
per  channel  use,  which  is  still  more  than  double  the  4.6  bits  per  channel  use  for 
a  single  receiver.  On  the  other  hand,  the  sum  rate  of  our  proposed  sum  capacity 
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Figure  3-8:  Outage  probability,  from  simulations,  for  precoding  of  K  =  7  streams 
from  an  M  =  8  element  array  using  various  ordering  methods.  We  use  power  control 
at  an  input  SNR  per  link  of  5  dB. 

algorithm  with  waterhlling  has  the  even  higher  value  of  12.6  bits  per  channel  use 
(with  8  streams),  which  is  only  2  below  the  bound  with  receiver  cooperation. 

The  idea  of  the  max  min  ordering  is  the  opposite  of  maximuming  sum  capacity; 
we  now  boost  the  performance  of  the  receivers  that  will  have  a  more  difficult  time 
communicating  by  placing  them  in  the  more  privileged  early  positions.  This  suggests 
an  approximate  algorithm  of  ordering  the  receivers  in  the  reverse  of  their  channel 
strengths,  regardless  of  the  potential  interference.  As  shown  in  the  simulation,  this 
strategy  improves  significantly  upon  the  max  sum  ordering  as  well.  A  random  ordering 
(not  shown)  falls  somewhere  between  these  two. 

3.2.3  Constellation  Design  to  Reduce  Precoding  Power  Loss 

We  discussed  earlier  how  transmitted  symbols  often  have  higher  average  power  after 
precoding  than  in  the  original  constellation.  One  proposed  method  to  overcome  this 
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precoding  power  loss  involved  higher-dimensional  lattice  codes,  distortion  compensa¬ 
tion,  and  dither  [84,  10].  However,  these  techniques  can  add  considerable  complexity 
at  both  transmitter  and  receiver.  This  motivates  a  study  of  the  importance  of  the 
precoding  power  loss  as  well  as  other  methods  of  compensating  for  it. 

Using  the  example  of  uncoded  QAM  modulation,  we  hnd  that  precoding  power 
loss  can  vary  by  as  much  as  3  dB,  depending  on  the  particular  alignment  of  signal 
and  interference.  Both  the  greatest  variation  and  worst-case  losses  occur  for  very 
low-order  modulations  when  there  is  only  a  single  dominant  interfering  signal.  For 
these  situations,  we  investigate  ways  of  manipulating  the  symbol  constellation  that 
attempt  to  ensure  one  of  the  better-case  scenarios. 

Bounds  on  the  Precoding  Power  Loss 

Suppose  that  a  stream  uses  uncoded  A^-QAM  modulation,  with  constellation  symbols 
spaced  2C,  units  apart.  We  will  measure  the  precoding  power  loss  by  analyzing  how 
much  more  transmitted  power  is  necessary  for  precoding  than  with  a  QAM  constel¬ 
lation  at  the  same  distance  and  no  interference.  The  normalized  minimum  squared 
distance  4('^/A/o  will  be  our  baseline  for  performance,  as  this  has  an  approximate 
correspondence  with  probability  of  symbol  error,  but  is  much  easier  to  deal  with. 
We  ignore  the  overall  effective  channel  gain  {Ikd^)  here,  as  this  only  scales  the  out¬ 
put.  Also  note  that  with  some  algebraic  manipulation,  our  results  can  be  converted 
to  instead  compute  the  loss  in  minimum  distance  if  the  transmitted  power  is  held 
constant. 

Depending  on  the  particular  input  symbol  and  interference  signal,  the  precoded 
symbol  can  take  on  any  value  in  the  fundamental  region  A  =  {(— AQ  AQ  x  (— AQ  AQ}. 
To  compute  the  precoding  power  loss  for  a  particular  interference  realization,  we 
should  average  over  all  possible  input  symbol  values.  It  is  straightforward  to  show 
that  the  average  power  for  the  input  QAM  symbol  is  smallest  when  the  interference 
point  is  centered  between  four  neighboring  constellation  points  (which  includes  the 
case  of  no  interference),  while  the  worst  case  occurs  when  the  interference  coincides 
with  a  constellation  point.  Because  the  interference  is  discrete- valued,  it  could  poten¬ 
tially  always  be  at  a  best-case  location  (where  we  get  the  same  performance  as  if  there 
had  been  no  host),  a  worst-case  location,  or  it  may  take  on  many  values  in  between. 
The  range  of  average  precoded  symbol  powers  can  be  summarized  as  follows: 

•  Best-case  interference  (or  none):  ^(A^  —  1) 
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Figure  3-9:  Precoding  power  loss,  in  terms  of  SNR  gap  from  a  scenario  without  any 
interference.  The  points  shown  are  for  uncoded  R^-QAM  input  symbols. 

•  Uniformly-distributed  interference: 

•  Worst-case  interference:  +  2) 

The  precoding  power  losses  for  the  different  interference  possibilities  are  shown 
in  Fig.  3-9.  The  graph  indicates  that  the  potential  loss  is  only  signihcant  for  very 
small  constellations.  Note  that  the  pseudorandom  dithering  technique  can  be  used 
to  always  ensure  that  the  interference  looks  uniformly  distributed.  If  this  is  not  done 
and  we  encounter  a  worst-case  alignment,  the  maximum  precoding  loss  is  3  dB  for 
4-QAM  modulation. 

Considering  these  trends,  we  would  like  to  know  if  something  can  be  done  to 
mitigate  this  loss  for  low-order  constellations,  in  effect  transforming  a  worst-case 
interference  value  to  something  better.  Because  the  worst-case  possibility  comes 
about  from  discrete-valued  interference  and  constellation  points,  it  makes  sense  to 
try  adjusting  the  constellation  design. 
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Constellation  Design  for  4-QAM  Interference 

We  concentrate  on  embedding  two  bits  per  complex  interference  sample  because  we 
saw  that  only  very  low-order  constellations  such  as  4-QAM  present  a  signihcant  po¬ 
tential  for  improvement.  Before  the  constellation  design  stage,  we  need  to  look  at  the 
distribution  of  the  interference  signal. 

The  interference  signal  will  be  some  linear  combination  of  the  modulo-equivalent 
symbols  sent  to  earlier-ordered  receivers,  Si, . . . ,  5fc_i.  The  linear  combination  is  spec- 
ihed  by  the  off-diagonal  entries  of  the  Gp  precoding  matrix,  corresponding  to  the 
feedback  loop  in  TH  precoding.  For  later-ordered  streams,  this  linear  combination 
of  symbols  can  take  on  quite  a  great  many  different  values,  and  will  tend  toward 
the  uniform  distribution  mentioned  above  (after  the  modulo  is  taken  into  account). 
On  the  other  hand,  earlier-ordered  streams  will  only  see  a  linear  combination  of  one 
or  two  symbols,  so  the  interference  distribution  will  continue  to  look  discrete,  and 
may  at  times  hit  the  worst  case.  These  are  the  situations  where  careful  constellation 
design  is  most  needed  and,  due  to  the  structure  of  the  interference,  where  the  most 
can  be  done. 

2-Bit  Signaling  with  4-QAM  Interference 

Let  us  start  with  the  simplest  case,  where  both  of  the  first  two  streams  use  2  bits  per 
complex  symbol.  Since  the  first  stream  sees  no  interference,  its  constellation  will  look 
like  standard  4-QAM.  The  second  stream  will  then  see  the  first  as  interference,  after 
a  gain  and  phase  shift.  Similar  interference  could  occur  for  later  streams  if  the  linear 
combination  of  earlier  symbols  heavily  favors  one  of  them  over  the  others. 

Assume  that  the  transmitter  phase-aligns  the  current  input  symbol  with  the  4- 
QAM  interference.  Consider  first  the  two  extreme  cases  of  no  interference  and  very 
large  interference.  In  either  case,  we  can  just  send  the  new  symbol  s  as  is,  resulting  in 
the  received  distributions  shown  in  Fig.  3-10.  Both  result  in  the  best-case  performance 
of  no  precoding  power  loss.  The  large-interference  case  is  the  same  as  superposition 
coding  [14] ,  where  the  earlier  message  is  strong  enough  that  the  receiver  can  determine 
what  point  was  sent,  subtract  (modulo)  that  out,  and  then  detect  the  second  message. 
In  this  and  the  next  figure,  the  precoded  signal  that  was  sent  is  the  difference  between 
the  interference  point  and  the  nearest  “quantizer”  point  corresponding  to  the  desired 
symbol. 

In  practice,  it  is  likely  that  something  in  between  these  two  extremes  will  occur. 
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Figure  3-10;  Simple  embedding  for  (a)  very  small  or  (b)  very  large  interference. 
Possible  interference  values  are  shown  as  •,  and  embedding  points  with  [>,  o,  and 

X. 

To  adjust  for  this,  hrst  consider  the  large  interference  case  of  Fig.  3-lOb.  We  could 
relabel  the  constellation  points  and  achieve  the  same,  optimal  minimum  distance  with 
slightly  smaller  interference,  as  shown  in  Fig.  3-llc.  Interestingly,  this  mapping  can 
be  interpreted  as  a  form  of  distortion  compensation,  where  some  of  the  “quantizer 
error”  from  a  standard  TH  precoder  is  added  back  in  the  form  of  self-noise. 

As  the  interference  gets  smaller,  some  of  the  constellation  points  merge,  as  in 
Fig.  3-llb.  At  some  point,  we  would  expect  to  go  back  to  the  no-interference  method, 
shown  in  Fig.  3-lla.  It  turns  out  that  this  switch  occurs  when  the  interference  has 
half  the  magnitude  of  the  symbol  to  be  embedded.  If  the  possible  interference  points 
are  spaced  2^/  units  apart,  then  straightforward  calculations  yield: 

•  Method  of  Fig.  3-lla 

f  2e+2ci  o<c/<c, 
n  =  l  2e  +  {2e-2c]),  c<C/<2c, 

I  periodic,  consequent  ranges  of  2^. 


•  Method  of  Fig.  3-llb 


Vk 


2C2  +  2(C-C/)^  0<C/<C, 

2C^  0  >  c 


For  a  particular  interference  distribution,  the  better  of  these  two  methods  would  be 
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Figure  3-11:  Modified  versions  of  the  embeddings  in  Fig.  3-10  for  more  moderate- 
power  interference. 


chosen.  As  shown  in  Fig.  3-12,  using  this  adaptive  constellation  instead  of  standard 
TH  precoding  provides  a  gain  of  1  to  3  dB  over  a  relatively  wide  range  of  possible 
interference  powers.  These  include  typical  scenarios  without  power  control  when 
there  are  two  receivers  and  two  or  three  transmit  antenna  elements  (average  relative 
interference  powers  of  1  and  0.5,  respectively.) 

This  adaptive  constellation  does  not  need  to  signihcantly  affect  the  complexity  of 
the  receiver,  which  must  distinguish  among  the  embedding  points  corresponding  to 
different  input  symbols.  Since  all  “o”  points  in  Fig.  3-llc  embed  the  same  symbol 
value,  the  receiver  can  treat  the  whole  center  region  as  a  single  decision  region  in  its 
sheer.  The  sheers  for  the  constellations  in  Fig.  3- 11b  and  c  then  form  a  continuum 
parametrized  by  an  overall  gain.  The  receiver  will  have  to  use  a  separate  sheer  for 
Fig.  3-lla,  but  it  may  be  possible  to  determine  which  of  the  two  to  use  based  on  the 
distribution  of  received  data. 


2-Bit  Signaling  in  Larger-Order  QAM  Interference 

The  interference  will  not  always  consist  of  only  four  possible  points.  This  is  es¬ 
pecially  true  because  the  hrst  stream,  which  causes  interference  on  later  ones,  will 
typically  have  the  best  channel  quality  and  is  therefore  more  likely  to  use  higher-order 
modulation.  The  methods  of  the  previous  discussion  can  still  be  used,  although  the 
description  is  more  difficult  and  less  dramatic  gains  are  possible.  This  is  discussed  in 
Appendix  B. 
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Figure  3-12:  Precoding  power  loss  for  the  methods  of  Fig.  3-11,  compared  with  using 
a  hxed  constellation  that  does  not  depend  on  the  interference  distribution. 
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3.2.4  Channel  Information 


Precoding  systems  of  the  type  described  in  this  chapter  involve  a  unique  set  of  process¬ 
ing  and  channel  knowledge  assumptions.  In  this  section,  we  look  at  what  information 
is  needed  at  the  transmitter  and  receiver,  and  how  this  affects  the  processing  and 
performance. 


Transmitter  Processing 


The  transmitter  needs  to  compute  the  beamforming  and  precoding  matrices  as  well 
as  appropriate  information  rates,  so  it  needs  to  know  the  channel  vectors  of  all  the 
receivers  to  which  it  is  currently  communicating.  We  have  discussed  how  this  can  be 
accomplished  either  through  feedback  from  the  receivers’  knowledge  or  by  training 
on  the  reverse  channel  in  time-division  duplex  systems. 

Errors  in  the  channel  estimation  can  lead  to  unintended  interference  at  the  re¬ 
ceivers.  The  modulo  operation  makes  an  exact  error  analysis  difficult,  but  some 
indicators  of  the  sensitivity  can  be  found.  Recall  that  the  beamforming  step  converts 
the  channel  matrix  H  into  a  lower-triangular  form  using  the  LQ  factorization.  If  the 
true  H  differs  from  the  estimated  value  by  a  perturbation,  then  the  corresponding 
perturbations  in  L  and  Q  can  be  magnihed  by  about  k,2{H)  [59],  where  K2{H)  is  the 
condition  number  of  H, 


H2{H) 


^max 


This  means  that  the  precoding  step  will  be  based  on  a  different  L  from  the  true  one, 
producing  interference  at  the  receivers.  This  effect  can  be  lessened  by  ensuring  that 
H  is  well-conditioned.  One  way  to  do  this  is  to  send  to  fewer  than  the  maximum 
number  of  receivers.  A  different  approach,  discussed  in  Chapter  4,  is  to  specihcally 
select  receivers  that  lead  to  well-conditioned  channel  matrices. 

Another  analysis  technique  is  to  use  a  model  for  the  perturbation.  Suppose  that 
the  true  H  differs  from  the  estimated  value  by  a  A'  x  M  matrix  Ai  that  has  i.i.d. 
complex  Gaussian  elements.  The  received  vector  before  the  additive  noise  will  be 


{H  +  Ai)  Q^GpDs  =  LGpDs  +  AaGpDs, 

where  A2  =  AiQ'l’  is  a  A'  X  K  matrix  whose  elements  are  i.i.d.  Gaussian  with 
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the  same  variance  as  those  in  Ai.  Since  the  first  term  on  the  right-hand  side  is 
the  intended  ontpnt,  the  nncertainty  in  H  apparently  adds  a  “noise”  term,  which 
for  any  particular  input  vector  is  Gaussian  distributed.  With  perfect  coding  and 
embedding,  the  vector  of  precoded  symbols  G^Ds  will  look  i.i.d.  Gaussian,  so  the 
characteristics  of  this  noise  term  for  each  receiver  can  be  computed;  its  variance  will 
clearly  be  proportional  to  the  variance  of  the  perturbation.  Unfortunately,  this  does 
not  provide  a  complete  characterization,  since  the  precoded  symbols  are  dependent 
on  the  original  symbols.  For  example,  the  constellation  expansion  in  s  will  likely  be 
largest  when  the  previous  precoded  symbols  happen  to  be  largest,  suggesting  that 
the  new  noise  term  will  be  at  its  worst  when  there  are  large  modulo  terms. 

To  ensure  numerical  stability  of  the  precoding  algorithm,  care  must  be  taken  in  the 
choice  of  LQ  factorization.  The  straightforward  implementation,  basically  the  Gram- 
Schmidt  procedure,  can  lead  to  a  severe  loss  in  orthogonality  among  the  beamforming 
vectors  [32].  The  “modihed  Gram-Schmidt”  procedure  is  more  careful  about  internal 
scaling,  and  QQ^  differs  from  identity  by  a  matrix  of  approximate  norm  e/t2(iT), 
where  e  is  the  machine  precision.  Once  again,  we  see  that  using  better-conditioned 
channel  matrices  helps.  A  different  LQ  algorithm  using  Householder  transformations 
takes  about  twice  the  number  of  computations  but  achieves  still  better  orthogonality 
(approximately  e  from  identity). 

Receiver  Processing 

The  relevant  receiver  processing  consists  mostly  of  locking  on  to  the  gain  and  phase  of 
its  symbol  stream  and  then  detecting  the  symbols  with  a  sheer.  To  do  this  effectively 
requires  estimating  the  complex  effective  channel  gain,  Ikdk-,  or  developing  systems 
that  work  around  this  step.  It  is  at  this  level  that  spatial  precoding  presents  some 
unique  channel  knowledge  and  sensitivity  issues. 

Although  this  type  of  processing  is  common  in  digital  communication  systems, 
many  of  the  usual  methods  may  not  be  appropriate  for  spatial  precoding.  For  exam¬ 
ple,  cellular  systems  often  use  phase  shift  keying  constellations  that  do  not  require 
hne  gain  estimation.  However,  the  modulo-equivalent  constellation  points  of  precod¬ 
ing  make  gain  control  a  necessity;  even  if  we  started  with  a  4-QAM  constellation,  the 
modulo-extended  version  would  expand  to  a  constellation  of  higher  order.  Point-to- 
point  digital  subscriber  hne  (DSL)  systems  use  a  rather  involved  training  phase  that 
allows  the  receiver  to  estimate  its  channel  and  determine  the  rates  and  gains  on  the 
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various  subchannels  [2],  Once  again,  however,  our  channel  is  different  in  that  these 
decisions  must  be  made  in  a  centralized  manner  based  on  all  of  the  receivers’  channels 
and  not  each  individually. 

We  are  therefore  left  with  two  choices:  all  channel  information  can  be  distributed 
to  all  receivers,  or  the  transmitter  can  inform  or  train  the  receivers  for  their  individual 
complex  gains  and  rates.  The  second  option  appears  to  be  easier  and  involve  the 
transfer  of  less  information.  Because  different  constellations  can  look  the  same  under 
the  modulo  operation,  the  transmitter  could  inform  the  receivers  of  their  streams’ 
modulation  and  coding  scheme  using  a  few  highly-protected  symbols,  as  is  common 
for  this  type  of  header  information  (see,  e.g.,  [21]).  On  the  other  hand,  it  may 
make  sense  to  train  the  receivers  on  the  complex  gain  so  that  they  can  do  their  own 
adaptive  gain  control  and  continue  to  adjust  it  as  the  channel  varies  slightly  from  the 
transmitter’s  estimates. 

To  see  the  importance  of  an  accurate  gain  estimate,  consider  a  modulo-extended 
4-QAM  constellation,  where  the  constellation  points  have  odd  real  and  imaginary 
integer  coordinates.  The  upper-right  constellation  point  (the  triangle  in  Fig.  3-3)  will 
have  real  coordinate  4n  -|-  1  for  some  integer  n.  If  the  receiver  multiplies  its  input  by 
too  large  or  small  a  gain  before  slicing,  then  it  could  cause  an  error  in  the  modulo- 
extended  sheer  even  in  the  absence  of  noise.  For  instance,  for  positive  n,  multiplying 
by  a  gain  that  is  a  factor 

4n  -|-  2 
4n  -|-  1 

too  large  will  cause  an  error.  Note  that  this  gets  steadily  stricter  with  more  severe 
constellation  expansions:  2,  6/5,  10/9,  etc.  This  type  of  effect  happens  for  a  sheer  on 
any  higher-order  constellation;  the  new  wrinkle  here  is  that  the  modulo  makes  a  low- 
order  constellation  act  like  a  higher-order  one.  This  provides  another  argument  for 
choosing  well-conditioned  channel  matrices,  since  this  will  help  limit  the  constellation 
expansion. 

Spatial  precoding  does  provide  some  immunity  to  the  ampliher  saturation  problem 
that  can  occur  in  typical  TH  precoding  systems.  These  issues  come  up  because  gain 
control  is  often  done  with  linear  amplifiers  or  other  devices  that  only  provide  good 
results  over  a  limited  range  of  inputs.  With  TH  precoding,  the  modulo-equivalent 
symbols  that  are  received  can  sometimes  be  large  and  may  cause  the  input  to  go 
beyond  this  range  [9].  Fortunately  for  spatial  precoding,  the  symbols  that  are  most 


80 


likely  to  undergo  large  constellation  expansion,  belonging  to  the  later-ordered  re¬ 
ceivers,  are  also  attenuated  the  most  by  the  effective  channel.  To  be  more  specific, 
assume  a  random  ordering  and  no  power  control.  This  means  that  the  vector  of 
precoded  symbols,  GpDs,  will  be  of  equal  maximum  power.  Through  beamforming 
and  the  actual  channel,  this  vector  will  be  multiplied  by  the  lower-triangular  matrix 
L  =  HGb-  The  first  receiver  will  therefore  get  the  first  precoded  symbol  multiplied 
by  /i,  whose  power  was  previously  determined  to  have  an  Erlang  distribution  with 
M  degrees  of  freedom.  The  second  receiver  will  get  a  mixture  of  the  first  precoded 
symbol  and  its  own.  Its  own  precoded  symbol  will  be  multiplied  by  a  smaller  factor 
than  before,  with  M  —  1  degrees  of  freedom,  but  the  other  symbol  will  arrive  with 
Erlang-distributed  power  with  1  degree  of  freedom.  If  the  two  symbols  add  up  co¬ 
herently,  then  the  overall  maximum  power  has  the  same  distribution  as  at  the  first 
receiver.  This  continues:  the  kih  receiver  will  get  a  superposition  of  its  own  symbol 
with  an  Erlang-distributed  power  distribution  with  M  —  k  +  1  degrees  of  freedom,  and 
k  —  1  other  symbols,  each  with  first-order  Erlang.  Therefore,  the  maximum  power 
will  have  the  same  distribution  for  every  receiver  regardless  of  its  ordering  placement. 


3.3  Precoding  for  Combined  Multiuser  and  Inter¬ 
symbol  Interference 

This  chapter  has  been  primarily  about  taking  techniques  previously  used  to  combat 
intersymbol  interference  and  applying  them  to  cross-channel  interference  between 
streams  intended  for  different  receivers.  One  would  expect  that  when  both  types  of 
interference  are  present,  a  generalization  of  these  methods  should  follow. 

One  approach,  used  by  Ginis  and  Cioffi  in  [30],  is  to  perform  a  discrete  multitone 
transform  (DMT)  to  convert  the  time- dispersive  channels  into  number  of  parallel, 
one-tap  channels.  Multiuser  precoding  can  then  be  performed  on  each  of  these  sub¬ 
channels.  We  seek  a  more  unified  treatment,  using  precoding  directly  for  canceling  all 
interference.  This  will  take  the  form  of  two  separate  algorithms,  representing  causal 
and  noncausal  processing,  depending  on  the  order  in  which  interference  is  canceled. 
After  we  develop  our  single-tone  algorithms,  we  compare  them  with  the  DMT-based 
methods. 
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3.3.1  Multiple-Receiver  Dispersive  Channels 

In  describing  these  more  general  channel  models,  we  take  some  of  our  notation  from 
the  multiple-input,  multiple-output  (MIMO)  model  of  [70].  A  discrete-time  M-input, 
A'-output  linear  time  invariant  (LTI)  system  can  be  represented  by  a  77  x  M  matrix 
H(z),  whose  entries  H^rniz)  are  the  2;-transforms  of  the  channel  from  the  mth  input 
to  the  /cth  output.  If  the  regions  of  convergence  all  contain  the  unit  circle,  than  a 
matrix  Fourier  transform  can  be  similarly  dehned.  We  dehne  the  “I”  operator 

to  perform  a  conjugate  transpose  and  additionally  reverse  the  time  sequence,  so 

One  special  type  of  MIMO  system  is  called  paraunitary.  This  means  that 

=c-I, 

a  scaled  identity  matrix.  If  H(;2)  is  defined  on  the  unit  circle,  then  we  can  similarly 
define  a  lossless  system  as  a  causal,  stable  system  for  which 

=  c  I. 

This  is  the  MIMO  analogue  to  an  allpass  hlter.  It  turns  out  that  when  H(2;)  is  defined 
on  the  unit  circle,  then  the  two  equations  above  imply  each  other,  so  a  lossless  system 
is  the  same  as  a  causal,  stable  paraunitary  system. 

The  basic  channel  model,  before  any  processing,  is 

y(^)  =  H(^)x(^)  +  w(^), 

where  y(;2)  are  the  channel  outputs  and  w(2;)  represents  a  realization  of  the  white 
Gaussian  noise  sequence.  We  will  assume  that  all  the  entries  in  the  channel  matrix 
H(2:)  are  causal  and  stable,  but  not  necessarily  minimum  phase.  For  a  precoding 
system,  the  antenna  inputs  x(z)  are  the  precoded  and  beamformed  symbols,  so  we 
have 


y(;2)  =  H(;2)GB(2:)Gp(2:)i:)s(;2)  -7  w(2;). 

Gp(2:)  is  the  precoding  matrix,  chosen  so  that  H(2:)Gb(;2)Gp(2:)  is  diagonal,  where 
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each  diagonal  element  of  this  product  consists  of  only  a  single  tap  that  is  not  a  function 
of  This  way,  a  receiver  sees  no  interference  across  time  or  from  other  streams.  We 
also  set  the  diagonal  elements  of  the  precoding  matrix  Gp(;2)  to  be  monic,  which 
moves  all  power  control  into  D. 

The  beamforming  matrix,  Gb(;2),  specihes  the  transformation  between  precoded 
symbols  and  antenna  inputs.  In  our  discussion  on  flat  fading  channels,  we  constrained 
the  elements  of  this  matrix  to  be  single-tap  hlters.  In  our  extended  model,  we  now 
allow  each  “beamforming  weight”  to  be  an  LTI  hlter.  Therefore,  each  antenna  element 
output  will  be  a  linear  combination  of  hltered  precoded  symbols  from  the  different 
streams.  We  will  also  impose  an  orthogonality  constraint  on  the  beamforming  (as  we 
did  in  Section  3.1.2  for  flat  fading  channels),  so  that  the  beamforming  matrix  Gb(;2) 
must  be  paraunitary  with  c  =  1, 

GBn^)GB(;^)  =  /.  (3.13) 

Note  that  this  imposes  an  orthogonality  across  both  time  and  different  streams. 

For  flat  fading  channels,  Gp  was  made  to  be  lower  triangular.  This  was  necessary 
so  that  the  intertwined  constellation  expansion  and  matrix  multiplication  operations 
of  the  precoder  could  be  performed  recursively  over  the  different  streams.  In  the 
more  general  model,  we  must  precompensate  for  interference  across  both  streams  and 
time.  We  will  hnd  that  this  leads  to  more  than  one  type  of  constraint  on  Gp(;2),  each 
corresponding  with  a  different  sequence  of  interference  cancellation  operations. 

3.3.2  Canceling  Multiuser  Interference  with  Causal  Process¬ 
ing 

The  /cth  row  of  H(;2)Gb(^)  determines  the  linear  combination  of  symbols  that  the 
/cth  receiver  would  see  from  both  its  own  and  other  streams,  if  the  precoding  step  had 
been  omitted.  With  precoding,  the  coefficients  of  every  power  of  of  each  entry  of  this 
row  will  multiply  some  precoded  symbol.  If  all  of  the  precoded  symbols  corresponding 
to  nonzero  coefficients  in  this  row  are  known  when  the  current  symbol  is  ready  to  be 
processed,  then  the  interference  can  be  computed  and  subtracted  off  so  that  that  the 
receiver  will  get  only  the  desired  symbol  (or  a  modulo-equivalent  version). 

The  ordering  in  which  symbols  are  processed,  with  respect  to  both  time  and  the 
different  streams,  will  determine  the  necessary  structure  of  H(2:)Gb(2:).  Suppose 
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Streams 


(a)  Causal  algorithm  of  Section  3.3.2 


(b)  Noncausal  algorithm  of  Section  3.3.3 


Figure  3-13:  Order  of  processing  symbols  for  precoding 


for  now  that  one  symbol  from  the  first  stream  is  precoded,  then  one  from  the  next 
stream,  etc.,  until  all  of  the  symbols  at  time  n  =  0  have  been  processed.  Next  comes 
the  n  =  1  symbol  of  the  first  stream,  and  so  forth.  A  graphical  view  of  this  processing 
of  symbols  is  shown  in  Fig.  3- 13a. 

For  the  kih  stream  to  precode  its  current  symbol  using  this  procedure,  it  needs 
to  know  the  past  precoded  symbols  of  all  streams  and  the  present  precoded  symbols 
of  the  streams  with  lower  indices.  This  means  that  in  the  /cth  row  of  H(;2)Gb(:2), 
the  first  k  entries  should  be  causal,  and  the  last  K  —  k  entries  shonld  contain  only 
negative  powers  of  ;2.  Since  H(2:)  is  already  cansal,  we  just  need  to  triangnlarize  the 
set  of  zero-lag  taps.  This  can  be  done  by  collecting  them  into  a  matrix  H,  performing 
an  LQ  decomposition  H  =  LQ  on  that,  and  nsing  as  the  beamforming  matrix 
Gb(2:)-  In  this  way,  the  beamforming  matrix  still  ends  np  as  a  set  of  single-tap, 
zero-lag  hlters,  even  though  this  was  not  specihed  a  priori. 

The  SNR  performance  of  this  method  is  straightforward  to  calculate,  since  only 
the  diagonal,  zero-lag  terms  of  H(2:)Gb(2:)Gp(2:)  contribnte  to  the  received  signal. 
Using  the  same  reasoning  as  in  Section  3.1.3,  the  received  SNR  for  stream  k  (to  first 
order  with  TH  precoding,  or  exactly  with  optimal  information  embedding)  becomes 


SNRfc 


V\ikdk? 

Mo 


where  Ik  is  the  kth  diagonal  entry  of  L. 

Note  that  performance-wise,  the  terms  of  H(;2)  with  negative-powers  of  were 
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essentially  ignored,  with  their  energy  getting  canceled  in  the  precoding.  This  is  hne 
when  most  of  the  energy  in  the  channel  responses  resides  in  the  zero-lag  terms,  but 
this  is  not  generally  the  case. 

A  solution  to  this  problem  for  single-user  ISI  channels  is  to  use  a  whitened  matched 
hlter  front  end  to  make  the  channel  minimum  phase.  Given  an  allpass  constraint,  this 
hlter  forces  as  much  energy  as  possible  into  the  first  tap.  For  the  multiple-receiver  ISI 
channel,  we  might  like  to  make  all  matrix  entries  minimum  phase,  but  unfortunately 
there  is  no  way  to  hlter  all  of  the  entries  of  H(;2)  independently.  Even  if  this  were 
possible,  it  is  not  clear  that  it  will  necessarily  lead  to  the  largest  SNRs.  What  we 
need  is  a  more  general  L(^)Q(2:)  decomposition  of  H(2:)  that  concentrates  as  much 
energy  as  possible  to  the  front  of  the  hnal  responses. 

3.3.3  Precoding  with  Noncausal  Filtering 

For  a  scalar  channel  with  impulse  response  h^{z),  the  whitened  matched  hlter  starts 
with  a  matched  hlter  h{z),  resulting  in  the  conjugate-symmetric  response  h}^{z)h{z). 
This  is  followed  by  a  hlter  that  makes  the  overall  response  minimum  phase  (and 
makes  the  combined  hlter  allpass).  These  ideas  can  be  extended  to  transmit  arrays, 
and  eventually,  multiple  users. 

Let  us  start  with  a  single-user  example,  with  M  transmit  antenna  elements.  This 
receiver’s  channel  model  is 

yi{z)  =  hj(^)gi(^)si(^)  +wi{z), 

where  the  elements  of  hj(;2)  are  assumed  to  be  causal.  Ignoring  the  paraunitary 
constraint  (3.13)  for  now,  the  received  signal  energy  is  maximized  by  making  gi(;2) 
proportional  to  a  bank  of  matched  hlters, 

gi(^)  =  7i(^)hi(^), 

for  some  scalar  hlter  '^i{z).  This  turns  the  vector  channel  into  a  scalar  channel,  with 
only  a  single  scalar  hlter  left  to  be  determined. 

The  power  constraint  comes  down  to 

glWgiW  =  1, 
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which  is  now  equivalent  to 


7l(2:)7i(^)hi(2:)hi(^)  =  1.  (3.14) 

Finding  'yi{z),  then,  is  equivalent  to  hnding  a  whitening  hlter  for  a  random  pro¬ 
cess  with  autocorrelation  hj(;2)hi(;2).  From  well-known  results  in  statistical  signal 
processing  [11],  if  hj(;2)hi(;2)  is  factorizable,  then  it  has  a  canonical  form, 

hj(2:)hi(^)  =  \ci\H\{z)ti{z),  (3.15) 


where  ti{z)  is  causal,  monic,  and  minimum  phase.  The  technical  conditions  for  factor- 
izability  are  that  both  ||hi(e-^‘^)|p  and  In  ||hi(e-^‘^)P  are  integrable  over  — tt  <  uj  <  tt. 
These  conditions  hold  for  many  functions  hi(;2)  of  interest,  such  as  FIR  and  rational 
2;-transforms.  The  constant  Ci  can  be  found  with 

^1011^  =  -;^  [  In  ||hi(e-^“)||^(ia;. 


In  general,  the  solution  for  71(2:)  in  (3.14)  is  not  unique,  but  can  contain  factors 
from  both  ti{z)  and  t\{z).  From  a  total  SNR  standpoint,  any  of  these  solutions  would 
give  equal  performance.  However,  because  we  want  to  use  this  system  for  precoding, 
the  equivalent  channel  h|(;2)gi(;2)  should  also  be  causal  and  minimum  phase.  This 
means  that  any  non-minimum-phase  factors  must  be  removed,  so  we  set 


gi(^)  = 


and  get  the  equivalent  channel 


(3.16) 


yi{z)  =  citi{z)si{z)  +  wi{z).  (3.17) 

We  call  this  solution  “noncausal”  because  the  hlter  in  (3.16)  is  in  general  not  causal. 

At  this  stage,  no  optimality  has  been  lost  by  using  this  vector  whitened  matched 
hlter.  In  fact,  a  frequency-domain  version  of  this  type  of  single-user,  transmit  array 
processing  was  derived  by  Zangi  and  Kransy  [85],  but  without  the  minimum-phase 
constraint.  Instead  of  precoding,  they  assumed  an  optimal  receiver  and  showed  that 
this  system  reaches  the  channel  capacity  if  waterhlling  across  frequency  is  also  per¬ 
formed. 
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We,  on  the  other  hand,  choose  to  do  transmitter  precoding,  so  performance  is 
determined  by  the  zero-lag  term  of  the  equivalent  channel.  Since  ti{z)  is  monic,  the 
received  SNR  is  c^V/Mq.  Before  we  go  on  to  multiple-receiver  generalizations,  it 
is  useful  to  go  over  an  example,  and  also  ask  whether  this  method  is  a  signihcant 
improvement  over  the  causal  method  of  the  previous  section. 


Comparison  with  Causal  Precoding 

As  a  simple  example,  take  a  two-antenna  system,  with  monic  channels 

hj(;2)  =  1  -  az~^  1  - /3z~^  . 

Let  the  input  SNR  be  V/Mo  =  1.  The  causal  processing  method  of  Section  3.3.2 
ignores  the  z~^  terms  for  the  beamforming  part  and  will  simply  combine  the  two 
channels,  each  with  weight  \/2j2,  to  get  the  composite  channel 

Since  the  performance  is  determined  by  the  zero-lag  term,  this  system  will  always 
have  a  received  SNR  of  2,  regardless  of  the  values  of  a  and  p. 

The  noncausal  solution  of  this  section  instead  hrst  performs  a  matched  hlter  and 
attempts  the  spectral  decomposition  of  (3.15).  Expanding  this  formula  out,  we  get 

hj(^)hi(2')  =  — (a  -t-  P')z  -|-  (2  -1-  |q;|^  -(-  |/3|^)  —  (ci  -t-  P)z 
=  \ci\‘^{l  —  dz~^){l  —  d*z) 


for  some  constant  d.  Since  it  is  |cip  that  determines  the  SNR,  we  solve  for  it  alge¬ 
braically: 


I  |2  _  2  -I-  \ap  +  \P\‘ 


+  -V4+|a|4-F 


+  2|a|2|/3|2-87^e{a/3*}.  (3.18) 


We  see  immediately  that  if  lap-f-  |/5p  >  2,  the  noncausal  method  will  perform  at  least 
as  well  as  the  causal  method  described  earlier,  and  will  be  much  better  as  lap  -|-  1/3 p 
increases.  This  is  not  surprising,  because  it  means  that  at  least  one  of  the  two  channels 
was  not  minimum  phase,  so  pushing  energy  toward  the  beginning  of  the  responses 
will  help  the  zero-forcing  precoder.  At  hrst  this  situation  may  seem  trivial,  since  the 
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transmitter  could  individually  filter  the  two  channels  to  be  minimum  phase.  However, 
when  we  move  on  to  multiple  receivers,  recall  that  in  general,  it  is  not  possible  for  the 
transmitter  to  make  all  of  the  channels  for  all  of  the  receivers  to  be  minimum  phase. 

When  lap  +  l/dp  <  2,  then  for  the  noncausal  method  to  be  better,  we  need 
whatever  is  under  the  square  root  sign  in  (3.18)  to  be  larger  than  [2  —  (lap  +  |/5p)]^. 
Subtracting  this  number  from  what  is  under  the  square  root  sign,  we  get 

4|ap  +  4|/3p  -  87^e{a/3*}  =  4|a  -  /3p, 

which  is  always  nonnegative,  and  equal  to  zero  only  when  a  =  (3,  that  is,  when  the 
two  channel  vectors  are  the  same.  Therefore,  the  noncausal  precoding  never  does 
worse  than  the  causal  method,  and  almost  always  does  better. 

Even  when  both  channels  are  minimum  phase,  it  turns  out  that  the  received  SNR 
of  the  noncausal  method  can  be  higher  by  a  much  as  a  factor  of  two.  (This  happens 
when  a  and  (3  are  near  the  unit  circle  and  differ  in  phase  by  tt.)  It  can  also  be  shown 
that  this  noncausal  hltering  method  does  no  worse  than  the  causal  method  for  any 
two-antenna,  two-tap  system,  whether  or  not  the  channels  are  monic  or  minimum 
phase. 


Multiple  Streams 


Once  again,  (3.16)  provides  the  hltering  to  be  done  on  the  hrst  stream.  What  must 
be  done  with  the  second  stream?  If  we  use  the  same  method,  then  this  will  cause 
interference  to  the  hrst  receiver,  which  should  be  avoided. 

Conceptually,  we  can  perform  a  similar  procedure  to  what  was  done  with  hat 
channels  and  make  use  of  a  kind  of  LQ  decomposition.  Previously,  this  amounted  to 
the  Gram-Schmidt  procedure  of  hnding  a  set  of  orthogonal  vectors  Q  that  span  the 
same  space  as  those  in  H.  Now,  instead  of  letting  g2{z)  be  proportional  to  h2{z)  as 
for  the  hrst  receiver,  we  need  to  make  it  proportional  to  the  component  of  h2{z)  that 
is  orthogonal  to  hi(;2).  From  linear  algebra,  this  component  can  be  written  as 


h2{z) 


h<hM£)h.(. 

hj(2:)hi(2:) 


(One  can  also  think  of  the  above  as  a  separate  orthogonalization  for  each  frequency.) 
We  now  need  to  normalize  this  function  and  make  the  overall  response  minimum 


phase.  With  a  little  algebra,  the  autocorrelation  is  shown  to  be 


(^hj(^)hi(^))  (^h|(2;)h2(2;))  -  (^hj(2;)h2(^))  (^h|(^)hi(^)) 

hj(2;)hi(2;) 

We  know  from  the  previous  subsections  that  the  denominator  has  the  canonical  fac¬ 
torization  \ci\H\{z)ti{z) .  Let  the  numerator  factorization  be  denoted  \c2\H\{z)t2{z) . 
Then,  once  we  normalize  out  the  maximum-phase  terms,  we  get 


h2(2:)  -  (2;)h2(2:))  hi(;2) 

c*2tUz)citi{z) 


(3.19) 


The  set  of  beamforming  vectors 


Gb(;^) 


gl(^)  g2{z) 


now  satishes  the  paraunitary  constraint.  Using  (3.16)  and  (3.19),  we  see  that  the  new 
effective  channel  becomes 


H(;^)Gb(;^) 


citi{z)  0 

h|(z)hi(2:)  C2t2(z) 

clt\{z)  citr(z) 


(3.20) 


and  the  SNR  at  the  second  receiver  will  be 

V  |c2d2p 
Wo  |ciP 

The  procedure  for  adding  yet  more  streams  follows  easily,  though  we  omit  the  details 
here  since  the  equations  become  more  cluttered. 


Operation  of  Precoding  Algorithm 

It  is  worth  taking  a  minute  to  consider  the  precoding  algorithm  implied  by  the  effective 
channel  of  (3.20).  Recall  that  tk{z)  are  monic  and  minimum  phase,  while  h|,(2;)  are 
causal.  This  means  that  the  diagonal  elements  will  be  causal,  but  the  entries  below 
the  diagonal  will  not.  What  does  this  say  about  how  the  precoding  operation  must 
proceed? 

Imagine  that  the  transmitter  wants  to  precode  the  current  data  symbol  of  stream 
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Figure  3-14:  Simulated  outage  probability  for  received  SNR  for  the  second  of  two 
receivers.  We  compare  the  methods  of  of  Section  3.3.2  and  Section  3.3.3,  with  a  4- 
element  array,  4  i.i.d.  Rayleigh-distributed  taps  each  of  variance  0  dB,  V/Mo  =  0  dB, 
and  equal  power  distributed  between  the  two  receivers. 

k.  It  needs  to  compute  the  interference  that  will  appear  for  this  data  symbol,  subtract 
it  out,  then  perform  a  modulo  on  the  result.  From  the  structure  of  (3.20),  the  inter¬ 
ference  will  depend  on  this  stream’s  own  past  precoded  symbols,  and  past,  present, 
and  future  precoded  symbols  of  earlier  streams.  Therefore,  before  the  transmitter 
can  precode  this  stream,  it  needs  to  wait  for  all  earlier  streams  to  be  precoded.  What 
results  is  the  algorithm  flow  of  Fig.  3-13b.  The  transmitter  precodes  all  the  symbols 
of  the  first  stream,  then  all  the  symbols  of  the  second  stream,  etc.  Realistic  imple¬ 
mentations  would  probably  truncate  the  responses  of  (3.20),  so  that  the  processing 
of  each  stream  only  needs  to  stay  a  specihc  number  of  symbols  ahead  of  the  next  one. 

Performance 

We  expect  that  this  noncausal  precoding  method  will  exhibit  a  performance  improve¬ 
ment  over  the  causal  method  of  Section  3.3.2,  which  only  takes  advantage  of  the  hrst 
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tap  of  each  channel  hlter  and  simply  cancels  energy  from  the  other  taps.  The  simu¬ 
lated  outage  curves  of  Fig.  3-14,  for  a  4-element  array  and  four  taps  per  channel,  bear 
this  out.  Shown  is  the  performance  for  the  second  of  two  receivers;  the  hrst  receiver’s 
performance  is  similar  but  is  1  to  2  dB  higher.  The  causal  curve  is  the  same  as  the 
usual  third-order  diversity  for  flat  channels.  (Recall  that  with  M  antenna  elements, 
the  kth  receiver  gets  diversity  order  M  —  k  +  1.)  Because  the  hltering  in  the  non- 
causal  method  attempts  to  use  energy  from  all  the  taps,  it  achieves  not  only  better 
average  performance,  but  also  has  smaller  tails  resulting  in  a  sharper  outage  curve. 
At  low  outage,  the  gain  is  almost  10  dB.  Even  the  noncausal  method  can  not  gain 
back  all  of  the  energy  from  all  of  the  taps,  but  it  does  come  close:  the  hrst  stream’s 
mean  received  SNR  is  90  percent  of  the  matched  hlter  bound.  Simulations  for  ergodic 
capacity  are  given  in  the  next  section. 

3.3.4  Comparison  with  DMT  Method 

The  DMT-based  method  of  Ginis  and  Ciofh  [30]  takes  a  very  diherent  approach,  as 
summarized  in  Fig.  3-15.  Each  stream  is  broken  into  blocks  of  N  symbols,  and  each 
block  is  put  through  an  inverse  discrete  Fourier  transform  and  then  prepended  with 
a  cyclic  prehx  before  being  sent  through  the  channel.  A  receiver  waits  for  the  entire 
block  to  be  received  and  takes  the  iV-point  discrete  Fourier  transform  (DFT).  The 
overall  ehect  is  transforming  the  ISI  channel  into  a  series  of  N  parallel  single-tap 
channels,  with  the  taps  equal  to  the  iV-point  DFT  coefficients  of  the  channel  impulse 
response.  For  this  to  work,  the  cyclic  prehx,  which  carries  no  useful  information, 
must  be  as  long  as  the  ISI.  In  the  context  of  the  multiple-receiver  problem,  this 
whole  procedure  transforms  the  H(;2)  matrix  into  N  parallel  matrices  with  single-tap 
entries.  Now,  the  beamforming/precoding  procedure  for  hat  fading  channels  can  be 
applied  to  each  of  these  separately.  We  will  continue  to  assume  that  the  system  uses 
the  precoding  method  that  results  in  zero  interference. 

A  comparison  between  our  precoding  method  with  noncausal  hltering  and  the 
DMT-based  method  reveals  many  features  of  the  classic  single  tone  versus  multitone 
discussion  that  has  traditionally  centered  around  scalar  ISI  channels.  One  way  to 
think  of  frequency-selective  channels  with  Gaussian  noise  is  as  an  inhnite  number  of 
parallel  channels  at  diherent  frequencies  that  can  be  optimized  separately.  Multitone 
methods  try  to  approximate  this  with  a  hnite  number  of  parallel  channels  N,  and 
break  up  the  input  into  this  many  substreams.  Diherent  power,  modulation,  coding. 
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(a)  Transmitter 


(b)  Typical  receiver 


Figure  3-15:  Block  diagrams  for  DMT  method  with  two  receivers.  Shown  are  the 
processing  at  the  transmitter  and  at  a  typical  receiver. 


and  now,  precoding,  can  be  used  on  the  different  substreams.  Single-tone  methods 
instead  encode  the  whole  stream  together  and  use  a  hlter  to  spread  each  symbol  over 
all  frequencies  so  that  the  transmit  spectrum  is  optimal.  Multitone  methods  suffer 
from  the  overhead  of  the  cyclic  prehx  and  from  the  hnite  number  of  subchannels 
approximation.  Single-tone  methods  typically  lead  to  more  complex  receivers.  This 
complexity,  along  with  the  necessity  of  receiver  cooperation,  was  alleviated  for  the 
most  part  by  using  precoding,  but  this  solution  leads  to  its  own  set  of  issues. 

For  both  methods,  precoding  forces  the  ordering  among  streams  to  be  set  at  the 
transmitter.  In  the  multitone  solution,  this  ordering  can  be  done  separately  for  each 
subchannel.  This  is  not  quite  the  same  as  being  able  to  reorder  for  each  frequency, 
both  because  of  the  hnite  number  of  subchannels  and  also  that  the  DMT  causes  some 
leakage  of  energy  across  frequency  bands.  As  the  block  size  gets  large,  these  issues 
should  disappear.  In  any  case,  there  more  control  over  the  ordering  than  the  single- 
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Figure  3-16:  Ergodic  sum  capacity  across  two  receivers  for  the  different  methods. 
Parameters  are  the  same  as  for  Fig.  3-14,  except  the  input  SNR,  V/J\fo,  is  made 
variable.  The  DMT  method  used  32  tones,  and  the  rate  penalty  from  the  cyclic 
prehx  is  ignored. 


tone  solution.  There,  the  channel  triangularization  was  performed  over  the  whole 
band,  so  the  same  ordering  is  used  over  all  frequencies. 

Another  issue  is  the  manner  with  which  the  interference  is  dealt.  Both  methods 
use  zero-forcing  precoding  to  eliminate  the  interference  across  the  different  streams. 
For  interference  across  time,  however,  the  DMT  codes  separately  at  the  different  fre¬ 
quencies,  while  the  single-tone  method  again  uses  precoding.  Zero-forcing  precoding 
is  known  to  be  optimal  at  high  SNR  [7,  12],  but  is  not  in  general,  so  the  DMT  seems 
to  have  an  advantage  here. 

In  our  preliminary  simulations,  these  details  do  not  seem  to  result  in  major  dif¬ 
ferences  in  performance.  For  example.  Fig.  3-16  shows  ergodic  sum  capacity  for  two 
receivers,  four  antenna  elements,  and  four  taps  per  channel.  The  difference  between 
the  DMT  method  (which  includes  optimal  ordering  at  each  tone,  and  waterhlling 
over  both  streams  and  tones)  and  our  noncausal  single-tone  method  (with  a  random 
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ordering,  and  waterfilling  only  over  streams)  is  negligible  except  at  very  low  SNR. 
Apparently,  the  filtering  in  the  single-tone  method  is  able  to  shift  most  of  the  power 
to  the  zero-lag  taps  and  does  not  suffer  from  the  zero-forcing  approach  to  ISI.  It  also 
suggests  that  the  ordering  and  waterfilling  issues  are  of  secondary  importance  here. 
As  expected,  the  causal  precoding  method  of  Section  3.3.2  lags  in  performance.  The 
multitone  method  did  not  seem  to  be  very  sensitive  to  the  number  of  tones  chosen, 
either.  However,  one  reason  to  choose  a  larger  number  of  tones  would  be  to  lower 
the  overhead  of  the  cyclic  prefix:  if  this  had  been  included,  the  single-tone  method 
would  have  been  better  at  most  SNRs.  As  the  number  of  receivers  is  increased  and 
the  system  becomes  more  constrained,  all  of  these  second-order  effects  may  gain  in 
importance. 

Changing  from  the  zero-forcing  to  the  more  general  multiuser  precoder  that  max¬ 
imizes  sum  capacity,  as  in  [82],  would  be  straightforward  for  the  DMT  method,  al¬ 
though  as  of  now  there  is  no  provably  optimal  algorithm  for  finding  the  optimal 
beamforming  matrix.  Similarly  modifying  the  single-tone  solution  to  allow  just  the 
right  amount  of  interference,  but  now  over  both  streams  and  time,  is  likely  to  be 
possible  in  principle  but  difficult  in  practice.  Recall,  though,  that  our  earlier  results 
suggested  that  at  reasonable  input  SNRs,  zero-forcing  precoding  (at  least  across  re¬ 
ceivers)  does  seem  to  achieve  a  large  part  of  the  potential  gain.  Similarly,  waterfilling 
across  both  streams  and  frequencies  may  be  easier  for  the  DMT  method  (where  they 
combine  to  form  a  single,  larger  power  control  problem),  but  our  simulations  and 
those  of  others  [12]  suggest  that  waterfilling  does  not  play  a  major  role  except  in 
cases  of  blocking  off  particularly  bad  channel  segments. 

The  single-tone  solution  does  have  additional  practical  advantages.  Each  receiver’s 
stream  is  sent  with  a  single  modulation  and  channel  code,  as  opposed  to  potentially 
different  ones  for  each  of  the  N  DMT  subchannels.  The  single-tone  beamforming 
filters  and  precoder  are  somewhat  more  complex  than  in  multitone,  but  once  again 
there  is  only  one  set  of  these.  The  DMT  must  run  a  separate  set  of  beamformers 
and  precoders,  with  different  coefficients,  for  each  subchannel.  The  transmitter  must 
not  only  operate  all  of  these  different  functional  units,  with  corresponding  added 
complexity  at  the  receiver,  it  also  needs  to  figure  out  the  correct  parameters.  Fur¬ 
thermore,  since  precoding  is  somewhat  sensitive  to  channel  estimation  errors,  having 
so  many  separate  precoders  may  require  either  more  accurate  channel  estimation  or 
more  conservative  rates. 
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Both  types  of  systems  require  a  certain  amount  of  delay.  For  the  single-tone 
system,  the  delay  is  in  the  noncausal  hltering  and  waiting  for  earlier-ordered  streams 
to  be  precoded  first.  (Recall  the  algorithm  operation  of  Fig.  3-13b.)  Both  of  these 
can  be  made  hnite  by  truncation.  The  DMT  method  processes  signals  block-wise,  so 
both  transmitter  and  receiver  must  wait  for  entire  blocks  to  appear  before  processing 
them. 


3.4  Concluding  Remarks 

The  array  processing  described  in  this  chapter  provides  another  example  of  the  power 
of  precoding/ dirty  paper  coding  approaches  in  a  variety  of  applications.  These  meth¬ 
ods  have  the  ability  to  layer  information  at  rates  that  were  previously  only  available 
with  additional  receiver  processing  and  coordination.  We  view  its  application  to  array 
processing  in  terms  of  a  factorization  between  linear  and  nonlinear  operations.  We 
saw  in  both  the  flat  fading  and  frequency-selective  fading  scenarios  that  this  parti¬ 
tioning  can  be  done  in  many  ways,  leading  to  different  types  of  processing,  multiple 
orderings  among  streams,  and  various  performance  tradeoffs. 

Our  discussion  has  concentrated  at  the  level  of  understanding  these  partitionings 
and  their  implementations  in  practical  systems.  Several  open  questions  remain  along 
these  lines,  many  of  which  are  active  research  directions.  These  include  developing 
coding  techniques  that  “close  the  gap”  to  capacity,  further  characterizing  robustness 
to  imperfect  channel  knowledge,  hnding  operating  points  for  various  performance 
criteria,  and  exploring  the  role  of  the  distortion  compensation  parameter  for  specihc 
signaling  schemes. 
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Chapter  4 


Informed  Data  Scheduling 


We  now  focus  on  how  to  improve  performance  by  using  schedulers  that  are  aware 
of  the  physical  channel  state  and  other  system  components.  This  is  in  contrast  to 
traditional  layered  architectures,  where  the  two  problems  of  selecting  which  streams 
to  send  at  a  particular  time  and  of  communicating  those  selected  streams  with  highest 
efficiency  have  usually  been  considered  separately.  For  example,  cellular  systems  often 
have  a  medium  access  control  (MAC)  layer  that  assigns  slots  or  waveforms  relatively 
independently  of  the  channel  state,  then  a  physical  layer  that  may  apply  adaptive 
techniques  based  on  properties  of  the  links.  The  analysis  of  precoding  in  the  previous 
chapter,  although  incorporating  some  amount  of  data  stream  awareness,  was  in  this 
tradition  in  the  sense  that  the  channel  vectors  were  random  and  presumably  selected 
by  an  independent  upper  layer.  However,  the  strong  roles  that  interference  and  fading 
play  in  spatial  multiplexing  suggest  that  further  integration  between  layers  may  be 
fruitful. 

We  have  seen  how  both  the  overall  system  throughput  and  the  reliability  of  in¬ 
dividual  links  can  be  improved  by  scheduling  more  than  one  stream  simultaneously. 
However,  an  array  of  a  given  size  can  only  spatially  multiplex  a  limited  number  of 
streams  effectively.  When  more  than  this  many  streams  have  data  to  sent,  the  sched¬ 
uler  must  make  decisions  on  how  they  are  to  be  grouped.  If  this  scheduling  process 
is  informed  by  the  state  of  the  channel  vectors,  the  general  scheme  of  the  physical 
layer,  and  a  small  amount  of  information  about  the  data  itself,  then  the  system  as  a 
whole  will  beneht.  In  addition  to  performance  gains,  the  system  may  even  be  able  to 
reduce  the  computational  requirements  of  the  array  processing,  so  that  less-intensive 
techniques  such  as  beamforming  can  be  used  instead  of  precoding. 
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Any  discussion  of  optimizing  performance  must  be  sensitive  to  the  particular  goals 
and  constraints  of  the  system.  In  Section  4.1  we  describe  a  classihcation  scheme 
based  on  the  delay  tolerances  of  data  streams  and  explain  what  kinds  of  scheduling 
are  appropriate  for  each  class.  Once  again,  we  assume  that  each  stream  is  intended 
for  a  unique  receiver.  Then,  in  Sections  4. 2-4. 5,  we  study  scheduling  algorithms  for 
these  classes  in  more  detail.  For  these  scenarios,  we  discuss  the  key  roles  of  channel 
orthogonality  and  magnitude,  and  how  our  algorithms  attempt  to  optimize  these 
to  improve  performance.  Finally,  in  Section  4.6,  we  bring  together  some  ideas  on 
combining  different  data  classes  within  the  same  system.  Although  more  research  is 
required  if  systems  must  give  rate  and  delay  guarantees  to  individual  streams,  our 
results  show  the  promise  of  channel-aware  scheduling  for  array  systems. 


4.1  Data  Model:  Classification  by  Delay  Tolerance 

Scheduling  algorithms  can  operate  at  a  variety  of  levels,  depending  on  the  features  of 
the  channel  and  data  streams  they  choose  to  model.  On  one  side  are  algorithms  from 
the  networking  community  such  as  weighted  fair  queuing  [53,  16]  that  typically  assume 
a  reliable  channel  and  seek  to  ensure  certain  qualities  of  service  for  a  heterogeneous 
set  of  data  streams.  A  rehned  model  called  service  curves  [15,  60]  enables  a  system  to 
satisfy  both  rate  and  delay  guarantees  simultaneously  by  having  each  stream  specify 
an  entire  set  of  rate  goals  at  various  delays.  Unfortunately,  these  types  of  results  are 
difficult  to  apply  to  our  wireless  channel  of  interest,  where  the  total  system  rate  de¬ 
pends  highly  on  the  particular  set  of  streams  selected  at  each  time.  Other  approaches 
pursue  less  ambitious  service  guarantees  but  include  a  greater  consideration  of  the 
physical  channel.  For  example,  in  the  multiuser  uplink  channel  with  single-element 
antennas,  Tse  and  Hanley  [67]  derive  scheduling  and  power  control  algorithms  for 
maximizing  the  instantaneous  weighted  sum  of  rates  among  the  different  streams. 
Okamoto  [51]  and  Shan,  et  al.,  [57]  describe  some  scheduling  algorithms  for  adding 
spatial  multiplexing  to  array  systems  while  maintaining  SINK  goals.  For  a  downlink 
array  system,  Viswanath  and  Tse  [74]  suggest  an  adaptive  timesharing  strategy  that 
transmits  a  stream  whose  associated  channel  realization  has  high  quality  with  respect 
to  its  mean  value. 

In  this  chapter,  we  consider  scheduling  algorithms  that  take  advantage  of  channel 
knowledge  and  lower-layer  spatial  multiplexing,  yet  still  respect  essential  differences 
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Type 

Delay  tolerance 

tight  delay 

one  to  several  packet  lengths 

medinm  delay 

several  packet  lengths  to  several  coherence  times 

large  delay 

more  than  several  coherence  times 

Table  4.1:  Summary  of  data  types,  organized  by  delay  tolerances 


between  classes  of  data.  Since  the  overall  performance  will  depend  on  the  flexibility 
of  the  scheduler  to  rearrange  packets  and  set  rates,  we  classify  data  streams  based 
on  a  few  general  levels  of  delay  tolerance,  as  summarized  in  Table  4.1.  The  data 
with  the  tightest  delay,  such  as  critical  sensor  data,  must  be  sent  within  a  small 
number  of  packet  lengths.  The  next  level  of  data,  perhaps  modeling  voice  traffic,  is 
more  tolerant  but  still  useless  if  not  received  within  a  few  coherence  times.  In  other 
words,  the  scheduler  can  rearrange  the  data  streams  into  different  groups,  but  can  not 
count  on  waiting  for  channel  realizations  to  change.  We  will  see  that  a  good  strategy 
here  is  to  select  groups  of  receivers  with  nearly  orthogonal  channel  vectors.  Finally, 
data  with  the  largest  delay  tolerance,  such  as  background  file  transfers,  is  concerned 
only  with  long-term  average  rates,  meaning  that  the  scheduler  has  the  freedom  to 
send  only  those  streams  with  good  instantaneous  channel  realizations.  Although  this 
classification,  based  on  when  data  must  be  sent,  should  not  be  confused  with  the 
discussion  of  signaling  strategies  in  Section  2.2.1,  similar  performance  criteria  are 
appropriate.  We  will  primarily  look  at  individual-receiver  outage  for  medium-delay 
data  and  ergodic  sum  capacity  for  large-delay  data.  Data  with  tight  delay  constraints 
is  of  a  different  natnre,  and  is  concerned  with  the  delay  of  a  particular  packet. 

Rather  than  a  detailed  sonrce  model,  we  consider  a  simple  mechanism  whereby 
each  stream  delivers  data  into  a  separate  bnffer.  When  the  buffer  reaches  some 
minimum  threshold,  it  places  an  entry  into  the  quene  of  “ready”  streams  with  data 
to  send.  When  this  stream  is  selected  for  transmission,  it  passes  data  from  the  buffer 
to  the  array  processor  functions  and  removes  the  entry  from  the  queue.  We  first 
consider  appropriate  scheduling  techniques  for  queues  of  a  single  data  class,  then  in 
Section  4.6  discuss  some  ideas  for  systems  with  multiple  classes. 

These  results  are  a  starting  point  for  making  scheduling  more  aware  of  the  chan¬ 
nel  state  and  array  processing.  They  show  the  potential  for  improved  performance, 
and  of  how  a  channel-aware  scheduler  can  rednce  the  complexity  requirements  of 
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other  system  components.  Our  specific  algorithms  are  not  meant  to  replace  quality 
of  service-based  methods  without  further  development.  Elements  such  as  source  dis¬ 
tributions  and  admission  control  are  needed  before  they  can  hope  to  make  the  same 
types  of  guarantees.  However,  our  contention  is  that  an  efficient  synthesis  of  the  two 
approaches  must  proceed  from  a  firm  basis  in  the  lower-level  awareness  rather  than 
just  a  small  adjustment  off  of  existing  quality  of  service-based  methods. 


4.2  Spatial  Multiplexing  Performance  for  Data  with 
Medium  Delay  Tolerance 

Before  developing  scheduling  algorithms,  we  must  first  identify  the  key  ways  in  which 
scheduling  can  impact  performance.  Let  us  say  that  for  data  with  medium  delay 
tolerance,  the  scheduler  must  send  each  stream  in  the  “ready”  queue  within  a  given 
bounded  waiting  time,  and  that  the  realized  channel  vectors  remain  constant  within 
this  time  period.  The  primary  impact  of  the  scheduler,  then,  will  be  in  how  these 
channel  vectors  are  grouped  together. 

In  this  section,  we  study  the  performance  of  spatial  multiplexing  methods  under 
different  assumptions  on  the  set  of  channel  vectors.  The  concentration  is  on  zero¬ 
forcing  beamforming,  with  some  discussion  on  precoding  as  well.  We  will  see  that 
the  angle  between  channel  vectors  plays  a  major  role,  and  that  the  scheduler  should 
therefore  choose  groups  of  receivers  with  nearly  orthogonal  channel  vectors.  At  the 
limit  of  a  purely  orthogonal  set,  precoding  reduces  to  beamforming,  suggesting  that 
with  channel-aware  scheduling,  the  computational  requirements  of  array  processing 
will  be  reduced.  These  findings  will  inform  the  scheduling  algorithms  developed  in 
Section  4.3. 

4.2.1  Diversity  Analysis  with  Random  Channel  Vectors 

An  interesting  way  of  looking  at  the  tradeoff  between  the  number  of  receivers  and 
performance  is  in  terms  of  the  diversity  benefit  of  the  array.  For  a  single  receiver  and 
Rayleigh  fading,  as  we  saw  in  Fig.  2-3,  the  effect  of  adding  transmit  antenna  elements 
is  both  to  increase  the  average  SNR  and  also  to  change  the  distribution  to  one  with 
considerably  less  variation  relative  to  the  mean  (in  particular,  an  Mth-order  Erlang). 

When  there  are  K  receivers,  we  expect  the  power  constraint  to  limit  the  average 
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SNR  per  receiver  to  only  MV/KMq  rather  than  MV /Mq.  However,  the  interference 
issue  turns  out  to  have  a  severe  effect  as  well.  Leveraging  a  result  from  the  zero-forcing 
receive  diversity  solution  [79,  65],  it  can  be  shown  that  in  the  absence  of  power  control 
(i.e.,  if  the  transmitter  sends  signals  of  equal  power  to  each  receiver), 

M  -  K  +  l  V 

SNR  ~  ErlangfM  —  77 -|- 1),  mean  = - -r.  (4.1) 

K  Mo 

The  important  observation  is  that  there  is  effectively  a  tradeoff  between  the  diversity 
beneht  of  the  array  and  the  number  of  receivers  to  be  multiplexed.  For  each  receiver 
the  transmitter  has  to  null  out,  a  stream  loses  one  degree  of  diversity.  For  K  =  M, 
for  example,  (4.1)  suggests  that,  once  the  i7-fold  loss  in  average  transmitted  power  is 
normalized  out,  each  receiver’s  SNR  distribution  is  the  same  as  if  we  were  transmitting 
to  only  that  one  receiver  using  a  single  antenna  element. 

One  might  think  that  using  power  control  on  the  zero-forcing  solution  to  equalize 
the  SNRs  of  the  different  receivers,  as  was  done  with  precoding  in  Section  3.2.2,  might 
help  increase  the  effective  diversity  (perhaps  at  the  expense  of  peak  performance). 
With  power  control,  the  SNR  for  each  receiver  will  be 

SNR„,  =  /  <  ValJH),  (4.2) 

Vo  E».. 

where  ak{H)  are  the  singular  values  of  H.  This  is  shown  by  starting  from  the  fact 
that  the  beamforming  matrix  G  must  be  a  scaled  pseudoinverse  of  H  and  Ending 
that  scaling  factor: 

Constraints:  G  =  (iTiT^) 

^  c^trace  I  (iTiT^) 

k=l 


When  K  =  M,  the  upper  bound  in  (4.2)  has  an  exponential  distribution  [18]  and 
once  again  equals  the  single-user,  single  antenna  element  distribution  (except  with  a 
loss  of  K  in  average  power). 


trace  =  V 

=  V 

=  V 

V 

1  ■ 

Mk=l  a^(H) 
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The  diversity  loss  will  be  less  severe  when  there  are  more  antenna  elements  than  re¬ 
ceivers,  and  performance  will  tail  off  more  gracefully  for  good  non-zero-forcing  strate¬ 
gies,  but  in  all  of  these  cases,  we  see  the  fundamental  conflict  between  sending  to 
more  receivers  and  the  benefits  of  diversity. 


4.2.2  Diversity  With  Orthogonal  Channel  Vectors 

The  discussion  above  assumed  that  we  must  transmit  to  a  group  of  randomly-selected 
receivers  at  a  single  time.  It  is  exactly  this  random  selection  of  receivers  that  causes 
the  loss  in  diversity.  When  selecting  streams  to  spatially  multiplex,  one  solution  would 
be  to  choose  only  those  streams  whose  receivers  have  nearly  orthogonal  channels. 
Before  going  on  to  propose  specific  systems  that  attempt  to  do  this,  we  investigate 
the  performance  potential  when  sending  to  a  random  set  of  orthogonal  channels. 

Suppose  that  K  receivers  are  multiplexed  using  an  M-element  array  {K  <  M). 
The  channel  coefficients  have  the  same  distribution  as  before,  but  now  assume  that 
the  channels  are  orthogonal,  so  that  the  transmitter  can  beamform  perfectly  to  each 
receiver  without  adding  interference.  The  distribution  in  (4.1)  now  becomes. 


SNRorth  ~  Erlang (M), 


M  V 

meaUorth  ~r7~  ■ 


(4.3) 


As  expected,  each  receiver  now  gets  the  full  M-level  diversity,  with  just  the  1/K 
factor  in  average  SNR  due  to  multiplexing  among  K  streams.  Precoding  can  be  seen 
as  achieving  a  compromise  between  (4.1)  and  (4.3),  in  that  the  kih  receiver  sees  an 
Erlang(M  —  k  +  1)  distribution  in  SNR. 

With  orthogonal  channel  vectors  and  power  control. 


SNR, 


V 


orthjpc 


A/'oEf=i 


- < 

1  — 


I  fit 


Because  each  receiver,  before  power  control,  has  an  equal  or  greater  SNR  than  if  the 
channel  vectors  had  not  been  orthogonal,  this  value  is  necessarily  larger  than  (4.2). 
The  difference  tends  to  be  significant,  since  the  harmonic  mean  in  these  formulas  is 
usually  dominated  by  the  weaker  elements,  and  ||/Vin|P  will  typically  be  much  larger 
than  the  minimum  singular  value  The  outage  distribution  for  an  8-element 

array  is  shown  in  Fig.  4-1,  where  the  difference  in  both  average  SNR  and  the  shape 
of  the  distribution  are  substantial. 
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Figure  4-1:  Outage  probability  for  an  8-element  array  transmitting  to  8  receivers, 
with  an  SNR  per  link  of  5  dB  and  power  control. 


The  improvement  in  using  orthogonal  receivers  can  also  be  seen  in  the  determinis¬ 
tic  asymptotic  performance  (as  in  [68]  for  receive  diversity)  for  large  systems.  When 
K  and  M  grow  to  inhnity  according  to  a  certain  ratio  (3  =  K/M,  the  performance 
with  random  and  orthogonal  receivers  are 


SNR 

SNRorth 


V  I- (3 

K  P 

1^1 


These  asymptotic  results  are  plotted  in  Fig.  4-2  and  show  a  3  dB  advantage  for 
orthogonal  channel  vectors  at  (3  =  0.5,  6  dB  at  (3  =  0.75,  and  rapidly  increasing  after 
that.  For  large  systems  with  small  (3,  even  randomly  selected  channels  will  most  often 
be  nearly  orthogonal,  so  there  is  not  much  to  be  gained  in  using  selected  receivers. 
This  does  not  completely  carry  over  to  small  systems  with  small  (3,  since  the  non- 
deterministic  performance  may  still  result  in  channels  with  bad  correlations,  but  the 
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Figure  4-2:  Deterministic  received  SNR  (per  receiver)  for  a  large  system  with  K/M  = 
(3  and  an  input  SNR  per  link  of  5  dB. 


general  result  still  holds  that  orthogonal  receiver  selection  is  much  more  important 
to  systems  with  higher  K/M . 

A  related  advantage  of  trying  to  find  orthogonal  channel  vectors  is  in  added  nu¬ 
merical  stability  of  the  beamforming  algorithms.  For  example,  when  the  zero-forcing 
beamforming  matrix  under  power  control  (i.e.,  pseudoinverse)  is  computed,  relative 
perturbations  in  H  can  be  magnihed  by  a  factor  bounded  by  2^2 (iF),  where  K2{H) 
is  the  condition  number. 


n2{H) 


of  H,  as  long  as  the  rank  is  not  changed  [59].  Similarly,  if  instead  precoding  using 
the  LQ  factorization  is  performed,  perturbations  magnihed  by  about  K2{H)  are  seen 
in  Q  and  L,  and  Q^Q  differs  from  identity  by  a  matrix  of  approximate  norm  eK2{H), 
where  e  is  the  machine  precision.  Using  nearly  orthogonal  rather  than  random  channel 
vectors  results  much  better  conditioned  H  matrices  (i.e.,  with  smaller  thus 
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producing  more  stable  computations. 

We  have  demonstrated  some  of  the  advantages  of  using  receivers  with  uncorrelated 
channels.  However,  unless  K  M,  then  these  channels  are  unlikely  to  occur  among 
randomly  chosen  receivers,  and  this  is  the  regime  that  provides  the  smallest  advantage. 
The  solution  is  to  look  at  a  wider  view  of  a  system,  which  will  likely  contain  more 
streams  than  can  be  spatially  multiplexed  at  any  one  time.  We  propose  an  amount 
of  integration  between  the  physical  and  MAC  layers,  so  that  the  correlations  between 
channels  can  inform  the  transmitter  on  how  to  intelligently  group  streams  to  help 
achieve  better  overall  performance. 


4.3  Scheduling  Algorithms  for  Data  with  Medium 
Delay  Tolerance 


We  now  go  on  to  develop  scheduling  algorithms  and  evaluate  their  performance.  We 
begin  with  an  example  where  all  streams  are  grouped  into  subsets,  and  then  consider 
a  more  dynamic  queuing  model  whereby  the  set  of  streams  with  enough  data  to 
send  changes  over  time.  As  more  data  streams  enter  the  queue,  performance  should 
increase  because  the  scheduler  has  more  flexibility  to  select  appropriate  groupings. 
We  will  see  that  not  only  does  this  expected  behavior  occur,  but  also  that  most  of 
the  improvement  can  happen  with  a  fairly  small  number  of  streams  in  the  queue. 

The  purpose  of  this  study  is  to  determine  the  potential  for  channel-aware  schedul¬ 
ing.  We  do  not  make  an  attempt  to  optimize  for  delay,  but  rather  to  minimize  outage 
while  ensuring  that  all  streams  in  the  queue  get  scheduled  before  their  channel  pa¬ 
rameters  are  likely  to  change.  A  more  complete  characterization  of  tradeoffs  between 
delay  and  outage  or  throughput  performance  remains  for  future  study.  We  do  pro¬ 
vide  some  analysis  of  delay  characteristics,  and  will  revisit  this  issue  in  Section  4.6, 
but  further  research  is  required  if  precise  delay  guarantees  are  necessary.  However, 
because  of  the  outage  improvement  seen  with  only  a  small  amount  of  grouping  flexi¬ 
bility,  we  expect  that  systems  may  be  able  to  support  high  rates  even  with  additional 
delay  constraints.  Conversely,  a  system  without  any  additional  constraints  may  be 
able  to  use  simplihed  schedulers  and  still  achieve  most  of  the  available  gains. 
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Figure  4-3:  Simulated  outage  curves  when  transmitting  from  a  4-element  array  to 
groups  of  3  receivers  using  zero-forcing  beamforming,  at  an  input  SNR  per  link  of 
V/Mq  =  5  dB.  Streams  are  partitioned  into  groups  using  a  “greedy”  algorithm. 

4.3.1  Static  Model  Example 

Consider  a  system  with  K'  streams,  all  of  which  send  continuous  data.  Therefore, 
the  scheduler  must  divide  all  of  them  into  spatial  multicast  groups  for  each  channel 
realization.  If  these  groups  are  of  size  K,  then  \K' /K~\  groups  are  necessary.  Given 
the  discussion  in  the  previous  section,  we  would  like  the  scheduler  to  select  groups 
in  which  the  angles  between  channel  vectors  is  large.  A  good  scheduling  algorithm 
should  be  able  to  approach  the  bound  of  orthogonal  channel  vectors  as  K'  increases. 

Unfortunately,  the  optimal  scheduling  for  this  problem  is  unknown,  and  in  any 
case  appears  to  be  combinatorial  in  nature.  More  promising  are  “greedy”  algorithms, 
which  select  the  optimal  result  at  each  step  rather  than  doing  a  global  search.  Fig.  4- 
3  shows  simulation  results  for  a  4-element  array  and  subsets  of  3  receivers  using  an 
algorithm  of  this  type: 

1.  The  first  \K' /K']  streams  are  put  into  separate  groups 


106 


2.  The  next  \K' /K^  streams  are  placed,  one  by  one,  into  the  groups  to  which 
they  are  “most  orthogonal”  (that  is,  the  largest  angle  between  the  associated 
channel  vectors),  until  all  groups  now  contain  two  streams. 

3.  The  last  set  of  streams  are  placed  similarly,  but  now  to  the  group  with  the  max 
min  of  angles  to  those  already  in  the  group. 

Within  each  group,  we  use  zero-forcing  beamforming  without  power  control.  As 
the  population  size  increases,  the  performance  grows  steadily  from  the  second  order 
diversity  of  random  groupings  to  the  fourth  order  diversity  of  perfectly  orthogonal 
channel  vectors.  At  10%  outage,  4  dB  out  of  the  potential  5  dB  gain  is  achieved  with 
32  groupings.  At  1%  outage,  a  similar  portion  of  the  total  7.5  dB  gain  is  achieved. 
The  ergodic  capacity  (without  power  control  or  waterhlling)  increases  as  well,  from 
4.5  bits/channel  use  for  random  selection  to  6.2  for  32  groups,  out  of  a  potential  6.8 
for  orthogonal  channel  vectors. 

A  £rst-£t  algorithm  and  its  variations  in  [51]  and  [57]  also  multiplexed  a  set  number 
of  users  into  timeslots  in  a  greedy-type  manner.  However,  those  e£orts  seeked  to 
maximize  the  number  of  users  in  each  slot  given  SINK  constraints  rather  than  optimize 
outage  given  a  number  of  users  per  slot,  making  comparisons  difficult.  Additionally, 
they  did  not  directly  emphasize  achieving  orthogonality  between  channel  vectors,  but 
only  implicitly  through  the  SINK  constraint. 

4.3.2  Dynamic  Queuing  Model 

A  more  realistic  and  dynamic  model  considers  streams  queuing  up  and  the  transmitter 
when  they  have  data  to  send.  The  scheduler  could  just  group  the  first  K  streams  at 
the  head  of  the  queue  together,  but  in  the  spirit  of  this  section,  higher  performance  can 
be  achieved  if  there  is  more  freedom  in  choosing  how  streams  are  grouped  together. 
We  quantify  this  idea  by  allowing  a  window  of  K'  streams  at  the  front  of  the  queue 
from  which  K  must  be  selected.  As  K'  increases,  we  expect  performance  to  increase 
as  a  more  orthogonal  set  of  channel  vectors  can  be  chosen. 

To  be  more  specific,  imagine  a  replenishable  queue  of  “ready”  streams,  each  as¬ 
sociated  with  a  random  channel  vector.  A  diagram  is  shown  in  Fig.  4-4.  To  ensure 
a  bounded  waiting  time,  the  first  entry  in  the  queue  must  be  sent  at  the  current 
time,  but  the  other  K  —  1  streams  to  be  multiplexed  can  be  chosen  from  anywhere 
in  the  K'  —  1  remaining  streams  within  the  window.  For  the  simulation,  these  are 
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Figure  4-4:  Diagram  of  queueing  model 

chosen  in  a  similar  manner  to  the  static  channel,  but  now  we  select  the  channel  vector 
that  is  most  orthogonal  to  the  subspace  of  the  channels  already  chosen  for  the  group. 
Equivalently,  this  is  the  receiver  that  will  lose  the  smallest  fraction  of  its  SNR  upon 
zero-forcing  beamforming.  After  this  set  of  K  entries  is  sent,  they  are  removed  from 
the  queue,  and  K  new  packets  with  random  channel  vectors  are  added  to  the  end  of 
the  queue.  By  adding  more  random  channel  vectors  each  time,  we  either  assume  a 
population  size  much  larger  than  K',  or  that  by  the  time  new  packets  from  the  same 
streams  reach  the  window,  their  receivers’  channel  vectors  have  changed. 

Fig.  4-5  shows  simulation  results  for  an  8-element  array  that  schedules  K  =  8 
streams  at  a  time.  Because  the  data  has  medium  delay  tolerance,  we  evaluate  per¬ 
formance  by  individual-receiver  outage  and  use  power  control.  As  in  Fig.  4-3,  the 
curve  when  the  scheduler  just  selects  the  eight  streams  at  the  head  of  queue  is  far 
from  the  bound  for  orthogonal  channel  vectors.  However,  even  a  very  small  amount 
of  freedom,  selecting  eight  of  the  first  nine  streams,  leads  to  gains  of  5  to  10  dB  for 
outages  in  the  range  of  1%  to  10%.  By  the  time  the  window  size  has  reached  twenty, 
the  outage  curves  are  starting  to  approach  the  orthogonal  bound. 

Other  array  processing  methods  will  also  beneht  from  channel-aware  scheduling. 
For  instance,  orthogonality  between  channel  vectors  is  also  desirable  for  the  precod¬ 
ing  solutions  of  Chapter  3.  There,  later  streams  must  direct  nulls  to  earlier-ordered 
receivers,  although  the  reverse  is  not  true.  The  scheduling  technique  described  above 
therefore  selects  the  new  stream  that  stands  to  lose  the  least  by  having  to  precode 
off  of  the  streams  already  selected.  In  Fig.  4-6,  we  compare  the  earlier  beamforming 
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Figure  4-5:  Outage  probabilities  for  a  queuing  system  with  M  =  8  transmit  antenna 
elements  and  K  =  8  simultaneous  receivers  chosen  from  a  window  size  varying  from 
8  to  20.  We  use  zero-forcing  beamforming,  power  control,  and  an  input  SNR  per  link 
of  5  dB.  Shown  for  comparison  is  a  bound  on  outage  corresponding  to  orthogonal 
channel  vectors. 


curves  side-by-side  with  those  for  precoding,  where  power  control  and  the  max  min 
ordering  method  of  Section  3.2.2  are  used.  Note  how  precoding  with  a  small  window 
size  achieves  similar  outage  performance  to  beamforming  with  a  large  window  size. 
Precoding  improves  still  further  with  larger  window  sizes,  but  by  this  time  the  incre¬ 
mental  gains  are  smaller.  We  see  similar  trends  looking  at  ergodic  sum  capacity  in 
Fig.  4-7.  This  illustrates  one  of  our  main  themes,  that  to  get  most  of  the  benehts 
of  a  transmitter  array,  a  system  designer  often  has  a  choice  between  sophisticated 
scheduling  or  array  processing  and  does  not  necessarily  have  to  use  high  complexity 
at  both  sides. 

For  various  reasons,  including  numerical  stability  of  the  beamforming  and  LQ 
operations,  it  might  be  desirable  at  times  to  spatially  multiplex  only  6  or  7  streams 
using  the  8-element  array.  Outage  and  sum  capacity  curves  will  show  the  same 
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Figure  4-6:  Same  as  Fig.  4-5,  but  we  now  add  similar  curves  for  precoding,  with 
power  control  and  proposed  max  min  ordering. 


general  trends  under  these  circumstances,  though  with  less  relative  improvement  as 
the  window  size  increases.  The  particular  V/Nq  value  will  also  affect  the  exact  rate 
tradeoffs  associated  with  multiplexing  more  or  fewer  streams. 

It  is  desirable  to  have  a  low-complexity  method  of  grouping  streams.  The  approach 
used  in  this  section  can  be  implemented  as  follows:  Find  an  orthonormal  basis  (using 
the  Gram-Schmidt  procedure,  for  example)  for  the  channel  vectors  of  the  streams 
already  in  a  group.  Then,  multiply  a  candidate’s  channel  vector  by  the  matrix  of 
this  basis,  and  determine  the  fraction  of  energy  that  remains.  Note  that  the  matrix 
stays  the  same  for  all  candidates,  and  once  one  candidate  is  chosen,  only  one  new 
element  of  the  updated  basis  needs  to  be  computed.  The  group  selection  appears 
to  be  somewhat  robust  to  different  methods  as  well.  A  different  grouping  based  on 
minimizing  the  condition  number  achieved  almost  the  same  performance  as  this  one 
(at  higher  complexity).  More  ad-hoc  methods  may  achieve  similar  performance  at 
lower  complexity.  With  most  reasonable  methods,  complexity  grows  with  the  window 
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Figure  4-7:  Ergodic  sum  capacity  for  zero-forcing  beamforming  and  precoding,  for 
the  same  simulation  as  in  Fig.  4-5  except  without  power  control. 


size;  fortunately,  it  appears  that  most  of  the  performance  gains  are  achievable  with 
relatively  small  windows. 


Delay  Characteristics 

This  selective  grouping  procedure  can  result  in  longer  delays  than  a  simple  £rst-in- 
hrst-out  (FIFO)  model,  but  this  delay  is  bounded.  Consider  a  total  population  of  P 
streams  intended  for  distinct  receivers,  backlogged  so  that  the  queue  always  contains 
exactly  one  packet  from  each  stream.  (That  is,  a  stream  will  always  have  data,  but 
is  not  allowed  to  put  another  packet  on  the  queue  until  its  previous  stream  has  been 
sent.)  The  transmitter  sends  K  streams  at  a  time.  In  FIFO,  a  packet  therefore 
always  jumps  K  places  toward  the  head  of  the  queue  at  each  time.  In  the  selective 
grouping  presented  above,  the  packet  always  moves  at  least  one  spot,  but  perhaps 
not  more.  We  can  summarize  the  maximum  and  minimum  delays,  in  terms  of  turns 
in  the  queue,  as  follows: 
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•  FIFO:  Maximum  delay  of 


P-  1 
K  ' 

The  minimum  delay  (assuming  all  streams  are  backlogged)  is  only  one  less  than 
this.  At  each  time,  K  new  streams  are  added  to  the  queue.  The  delay  can  vary 
by  one  depending  on  where  the  packet  of  interest  is  among  these  K. 


•  Selective  grouping,  window  size  P:  The  maximum  delay  is 


P-K, 


but  this  will  occur  very  rarely.  The  minimum  delay  is  zero,  because  this  packet 
could  be  chosen  as  soon  as  it  enters  the  queue. 


•  Selective  grouping,  window  size  K' ,  where  K  <  K'  <  P :  This  provides  a  com¬ 
promise,  where  packets  jump  K  places  each  turn  until  they  reach  the  window, 
and  may  move  slower  after  that.  By  judiciously  choosing  a  relatively  small 
window  size  K'  that  achieves  most  of  the  available  performance  gains,  one  can 
improve  delay  as  well  as  complexity.  The  maximum  delay  is 


1  + 


P-K'  -1 
K 


+  K'  -K. 


The  hrst  two  terms  are  the  time  it  takes  to  enter  the  window,  while  K'  —  K 
is  the  maximum  time  spent  within  the  window.  Note  that  this  reverts  to  the 
other  two  cases  (in  delay  and  algorithmically)  when  K'  =  K  or  K'  =  P. 


Our  experience  from  the  preceding  simulations  suggests  that  the  worst-case  delay 
occurs  only  rarely.  Also,  since  selective  grouping  increases  the  amount  of  information 
that  can  be  transmitted  at  each  time,  the  disparity  in  delay  per  information  bit  will 
not  be  as  great  as  that  of  delay  in  terms  of  turns  in  the  queue  as  given  above.  However, 
we  do  suggest  that  if  minimizing  delay  is  of  greatest  importance,  the  scheduling 
algorithm  should  be  modihed  somewhat. 
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4.4  Large  Delay  Tolerance 


While  the  scheduling  delay  in  the  previous  section  was  on  the  order  of  several  packet 
lengths,  other  types  of  data  may  tolerate  much  longer  waiting  times.  For  example, 
when  doing  hie  transfers  or  system  backups,  achieving  a  high  average  throughput  may 
be  much  more  important  than  the  delay  on  any  particular  packet.  In  these  cases,  the 
system  can  simply  maximize  the  sum  rate  over  each  channel  realization,  and  over 
time  the  rates  for  the  different  streams  will  even  out. 

This  idea  relates  to  a  growing  body  of  literature  on  “multiuser  diversity,”  in  which 
each  stream  communicates  when  is  associated  channel  vector  is  near  its  peak  strength. 
However,  most  of  these  results  are  for  timesharing  strategies  where  only  one  stream 
can  transmit  at  a  time.  Below,  we  provide  a  discussion  of  when  such  timesharing 
strategies  are  optimal  and  go  on  to  develop  scheduling  algorithms  for  spatial  multi¬ 
plexing. 


4.4.1  Relation  to  Timesharing  Strategies 

One  can  gain  a  perspective  on  timesharing  versus  spatial  multiplexing  by  placing 
our  problem  within  a  larger  context  of  multiterminal  wireless  scenarios.  The  channel 
may  be  in  the  uplink  or  downlink  direction,  and  the  base  station  may  or  may  not 
have  a  multiple-element  array.  By  looking  at  these  different  cases,  we  can  gain  an 
appreciation  of  the  roles  that  waterhlling,  power  constraints,  and  spatial  multiplexing 
play.  In  some  cases,  timesharing  will  be  sufficient  for  maximizing  the  sum  capacity, 
while  in  others,  the  gains  associated  with  spatial  multiplexing  will  far  outweigh  those 
achieved  by  simply  using  a  stream  during  a  good  channel  realization. 

Table  4.2  summarizes  some  results  for  these  different  scenarios.  One  important 
factor  is  the  form  of  power  constraint  used.  In  this  thesis,  we  have  concentrated  on 
a  peak  power  constraint,  so  that  at  each  time,  the  expected  power  is  below  some 
prescribed  limit,  £[x^x\  <  V.  A  system  could  also  potentially  allow  a  transmitter 
to  save  up  unused  power  for  a  later  time,  so  V  becomes  a  constraint  on  average 
power  over  all  time.  The  first  case  is  more  appropriate  for  satisfying  regulatory  limits 
or  minimizing  out-of-cell  interference,  while  the  second  may  be  a  better  model  for 
maximizing  battery  life.  With  an  average  power  constraint,  each  fading  realization 
may  be  considered  as  a  kind  of  parallel  channel  over  time  [66]  over  which  power  can 
be  waterhlled.  In  either  case,  the  uplink  power  constraint  is  for  each  user  individually. 
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Scenario 

Peak  Power  Constraint 

Average  Power  Constraint 

nplink 

No 

Yes  [40,  67] 

downlink 

Yes  [66] 

Yes  [66] 

uplink,  array 

No 

No  [76] 

downlink,  array 

No  [8] 

No 

Table  4.2:  Summary  of  when  timesharing  strategies  are  sufficient  for  maximizing  sum 
capacity. 

while  the  downlink  power  constraint  is  for  all  streams  combined.  The  results  shown 
all  assume  ergodic  variation  in  the  channel  parameters  and  equal  distributions  for  all 
users,  but  do  not  rely  on  a  Rayleigh  fading  model. 

When  there  are  multiple  users  in  the  system  with  separate  streams,  the  system  has 
a  choice  of  sending  multiple  streams  at  once  with  signaling-level  techniques  such  as 
beamforming  (if  there  is  an  array),  dirty-paper  coding,  and  interference  cancellation. 
Perhaps  surprisingly,  then,  Knopp  and  Humblet  [40]  reported  that  in  a  basic  uplink 
scenario,  with  an  average  power  constraint  and  no  array,  timesharing  is  sufficient  to 
achieve  the  ergodic  sum  capacity.  Simply  put,  the  user  with  the  best  instantaneous 
channel  realization  gets  a  chance  to  communicate.  It  then  waterfills  power  over  all 
such  situations  in  which  it  expects  to  be  selected  (so  at  some  times,  there  may  be  no 
active  streams).  Similar  results  were  shown  for  the  downlink  [66]  and  have  resulted  in 
a  timesharing  mode  called  HDR  for  the  CDMA  2000  cellular  specihcation  [69].  This 
idea  does  not  carry  over  as  well  to  a  peak  power  constraint  on  the  uplink,  since  a 
corner  point  of  the  rate  region  (see  [14])  with  more  than  one  active  user  will  often 
result  in  the  best  snm  rate  for  a  particular  realization. 

Things  change  signihcantly  when  the  base  station  has  a  multiple-element  array. 
With  array  processing  techniques  such  as  those  discussed  in  this  thesis,  it  can  often 
separate  the  signals  to  or  from  the  various  users  enough  that  the  channel  starts  to  look 
more  like  parallel  streams  than  additive  interference.  At  this  point,  the  system  can 
distribute  power  among  the  different  streams  and  achieve  spatial  multiplexing  gains, 
as  discussed  in  Section  2.3.2  and  elsewhere.  As  we  will  see,  this  effect  can  become  even 
more  important  than  hitting  each  nser  at  its  peak  channel  strength.  For  example,  if 
there  are  two  users  and  a  base  station  with  a  large  number  of  antenna  elements,  then 
the  channel  vectors  will  usnally  be  nearly  orthogonal  and  interference  will  not  be  a 
major  issne.  Except  at  very  low  SNR  or  very  high  channel  qnality  variation  (in  which 
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Figure  4-8:  Ergodic  sum  capacity  for  channel-aware  timesharing  strategies  with  var¬ 
ious  numbers  of  transmit  antenna  elements.  The  curves  were  computed  using  nu¬ 
merical  integration  over  independent  Rayleigh  fading  at  an  input  SNR  per  link  of  5 
dB. 

case  the  users  want  to  concentrate  the  streams  in  a  small  portion  of  the  available 
time),  there  are  likely  to  be  times  when  the  base  station  will  communicate  with  both 
at  once. 

To  illustrate,  we  plot  in  Fig.  4-8  the  ergodic  sum  capacity  for  downstream  time¬ 
sharing  under  a  peak  power  constraint.  Without  an  array,  the  overall  system  per¬ 
formance  increases  noticeably  with  the  number  of  receivers,  as  the  transmitter  can 
select  a  receiver  whose  channel  realization  is  near  its  peak  strength.  As  the  array  size 
increases,  we  see  a  lesser  relative  beneht.  This  is  because  the  array  enables  single-user 
beamforming,  which  results  in  a  received  SNR  distribution  with  considerably  less  rel¬ 
ative  variation  over  time.  On  the  other  hand,  spatial  multiplexing  can  achieve  much 
higher  rates  even  under  simple  scheduling  methods,  as  shown  earlier  in  Fig.  3-4.  This 
motives  our  emphasis  on  scheduling  algorithms  for  spatial  multiplexing  rather  than 
timesharing  in  the  next  subsection. 
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There  are  still  situations  where  timesharing  to  different  receivers  from  an  array 
may  still  make  sense.  In  some  cases,  the  channel  strength  varies  considerably  even  af¬ 
ter  the  array  processing.  For  example,  the  Infestations  proposal  [33]  models  receivers 
moving  relative  to  the  transmitter,  so  that  closer  ones  may  have  signihcantly  better 
instantaneous  channels  than  those  farther  away.  Another  reason  is  the  possibility  of 
attaining  diversity  gains  without  as  detailed  channel  information.  This  is  the  sub¬ 
ject  of  the  so-called  “dumb  antennas”  scheme  of  [74],  in  which  the  transmitter  sends 
along  random,  time-varying  beamforming  directions.  As  the  number  of  receivers  in¬ 
creases,  the  transmitter  can  approach  the  ideal  timesharing  performance  discussed 
above  while  only  knowing  the  instantaneous  SNRs  of  the  receivers  and  not  their  full 
channel  vectors. 

4.4.2  Scheduling  for  Spatial  Multiplexing 

We  now  proceed  to  develop  scheduling  that  incorporates  spatial  multiplexing  for  data 
streams  with  long  delay  constraints.  The  goal,  once  again,  is  to  maximimize  the  sum 
capacity  over  each  channel  realization  and  let  the  rates  for  individual  streams  average 
out  over  time.  Although  the  incremental  gains  are  not  always  signihcantly  greater 
than  those  discussed  earlier  for  medium-delay  data,  these  new  strategies  do  lead  to 
increased  performance  and  in  some  situations  points  to  lower-complexity  scheduling 
algorithms. 

We  know  that  precoding  maximizes  the  sum  capacity  when  the  number  of  streams 
was  less  than  the  number  of  transmitter  antenna  array  elements  M  [82];  perhaps  some 
extension  is  possible  when  there  are  greater  numbers  of  streams.  One  might  conjecture 
that  this  would  involve  a  selection  of  no  more  than  M  receivers  getting  information  at 
each  time,  since  the  transmitter  can  send  no  more  than  this  many  precoded  streams 
at  once  and  still  completely  null  out  interference. 

This  selection  process  recalls  the  “greedy”  max  sum  ordering  discussed  in  Sec¬ 
tion  3.2.2,  and  indeed  involves  many  of  the  same  issues.  As  a  practical  approach, 
we  could  use  the  same  method  and  simply  stop  after  M  streams  have  been  selected. 
The  results  of  this  procedure,  including  the  subsequent  waterhlling  across  streams, 
are  shown  in  Fig.  4-9.  There  is  a  clear  improvement  with  precoding  over  the  group¬ 
ing  method  that  only  considers  orthogonality  between  channel  vectors.  Zero-forcing 
beamforming  does  not  improve  as  much,  although  a  method  more  tuned  toward  this 
transmission  strategy  may  be  able  to  achieve  somewhat  better  gains. 
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Figure  4-9:  Ergodic  sum  capacity  for  zero-forcing  beamforming  and  precoding  with  an 
8-element  array  and  large  delay  constraints.  “Max  sum  proposed”  uses  the  method  of 
Section  3.2.2,  while  “Orthogonality  info  only”  uses  only  orthogonality  information,  as 
in  Section  4.3.  Waterhlling  across  streams  was  used  once  the  receivers  were  selected. 

One  way  to  think  about  this  problem  of  user  selection/ordering  is  to  say  that  there 
are  two  issues  that  affect  multiuser  performance: 

1.  Orthogonality  among  receivers’  channel  vectors 

2.  Instantaneous  channel  strength,  ignoring  potential  interference 

In  Section  4.3,  we  did  not  effectively  make  use  of  the  second  of  these  factors  be¬ 
cause  each  receiver  had  to  get  information  during  each  channel  realization.  With 
fewer  constraints,  we  now  see  some  improvement  by  taking  this  new  information  into 
account. 

To  see  the  relative  importance  of  the  second  factor,  compare  Fig.  4-9  with  Fig.  4- 
10.  The  second  hgure  shows  the  performance  of  a  four-element  array  using  the  “pre¬ 
code  order”  method  described  above,  which  takes  both  factors  into  account,  as  well 
as  a  simpler  method  that  only  selects  receivers  based  on  their  single-user  SNRs,  i.e.. 
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Figure  4-10:  Ergodic  sum  capacity  for  zero-forcing  beamforming  and  precoding  with 
a  4-element  array  and  large  delay  constraints.  “Max  sum  proposed”  uses  the  method 
of  Section  3.2.2,  while  “Channel  strength”  uses  only  single-user  SNR  information. 
Waterhlling  across  streams  was  used  once  the  receivers  were  selected. 

the  second  factor.  Comparing  the  two  hgures,  it  appears  that  for  zero-forcing  beam¬ 
forming,  most  of  the  gains  achieved  by  increasing  the  window  size  are  due  to  selecting 
more  orthogonal  channel  vectors,  while  most  of  the  precoding  gain  is  from  choosing 
the  strongest  channels.  (The  two  precoding  curves  in  Fig.  4-10  would  be  a  little 
further  apart  for  eight  antenna  elements,  but  “channel  strength”  still  achieves  better 
performance  than  the  orthogonality  selection  method.)  This  is  because  in  precoding, 
only  the  last  couple  receivers  (out  of  those  receiving  data)  have  to  sacrihce  signif¬ 
icant  performance  to  avoid  interference,  while  all  receivers  have  this  problem  with 
zero-forcing  beamforming. 

This  has  some  promising  implications  for  precoding.  If  most  of  the  gains  are 
achieved  by  selecting  receivers  by  their  channel  strengths  without  regard  to  interfer¬ 
ence,  then  the  complexity  of  the  user  selection  and  ordering  methods  can  be  signih- 
cantly  reduced. 
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4.5  Tight  Delay  Constraints 


So  far,  we  have  characterized  data  by  whether  the  allowable  delay  is  greater  or  less 
than  the  coherence  time  of  the  channel.  The  practical  distinction  was  whether  it 
is  reasonable  for  those  receivers  with  weak  instantaneous  channels  to  wait  for  their 
channel  strengths  to  improve  before  starting  communication.  In  either  case,  it  was 
assumed  that  this  allowable  delay  was  greater  than  several  packet  lengths,  so  that 
some  rearranging  and  spatial  multiplexing  is  tolerable. 

At  the  other  extreme  is  data  that  needs  to  be  received  as  soon  as  possible,  with 
delay  constraints  on  the  order  of  packet  lengths.  This  might  be  true  for  very  time- 
dependent  information,  such  as  control  signals  for  a  physical  system  or  critical  sensor 
data.  One  approach  would  be  to  transmit  this  delay-critical  stream  by  itself,  and 
then  resume  with  the  usual  scheduling  procedures.  However,  if  this  stream  does  not 
need  quite  all  of  the  available  rersources,  it  might  be  possible  to  take  advantage  of 
some  of  the  throughput  improvement  inherent  with  spatial  multiplexing. 

Suppose  that  receiver  one  needs  to  receive  a  packet  of  a  certain  size  by  some  given 
delay.  Equivalently,  it  needs  to  achieve  some  average  rate  over  that  time  span.  If 
the  transmitter  wishes  to  communicate  simultaneously  to  a  second  receiver,  it  should 
hnd  the  solution  that  maximizes  the  rate  of  the  second  stream  given  the  constraint 
on  stream  one’s  rate.  Unfortunately,  as  previously  discussed,  the  multiple-receiver 
rate  region  and  the  strategies  that  achieve  it  are  unknown.  Still,  practical  methods 
such  as  beamforming  or  precoding  may  be  able  to  increase  the  total  throughput  while 
satisfying  the  hrst  stream’s  requirements. 

One  way  to  visualize  this  would  be  to  look  at  capacity  regions.  Alternatively,  we 
could  take  a  more  direct  view  and  consider  delay.  If  the  streams  to  both  receivers  have 
the  same  amount  of  data,  they  could  be  sent  at  the  same  rate  and  hnish  simultane¬ 
ously.  But  if  the  hrst  stream  has  the  tighter  delay  constraint,  it  may  require  a  higher 
rate  than  this.  Its  packet  will  hnish  hrst,  then  the  transmitter  can  send  the  remaining 
bits  of  the  second  stream’s  packet  at  its  highest  possible  rate,  at  full  power  along  the 
single-user  beamforming  direction.  The  opposite  could  be  done  if  the  second  stream’s 
packet  hnishes  earliest.  As  shown  in  the  example  of  Fig.  4-11,  the  delays  at  the  two 
receivers  can  be  plotted  against  each  other  for  various  transmission  strategies  and 
power  distributions  among  the  two  streams.  Any  (delay^,  delay2)  pair  that  is  exterior 
to  the  curves  is  achievable  (as  opposed  to  capacity  regions  that  are  achievable  if  they 
are  interior  to  some  boundary).  Now,  given  a  minimum  delay  constraint  on  stream 
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Figure  4-11:  Typical  delay  regions  for  certain  two-receiver  strategies,  computed  for  a 
sample  channel  matrix  realization.  A  (delay^,  delay2)  point  is  achievable  if  it  is  on  or 
outside  (i.e.,  up  and  to  the  right)  of  the  boundaries  shown. 

one,  we  can  see  how  fast  we  can  get  stream  two’s  packet  across. 

This  perspective  of  communicating  with  both  receivers  until  one  of  them  is  £n- 
ished,  then  sending  any  remaining  bits  to  the  second  receiver,  was  inspired  by  the 
“static  broadcasting”  setup  of  Shulman  and  Feder  [58].  Their  information  theoretic 
description  was  for  a  very  general  channel  and  dealt  with  sending  common  informa¬ 
tion  to  both  receivers,  for  which  we  will  have  more  to  say  in  Chapter  5.  Instead  of 
delay,  they  plotted  its  inverse,  corresponding  to  a  kind  of  average  rate.  We  find  that 
plotting  delay  relates  more  closely  to  the  goals  of  time-sensitive  data,  and  furthermore 
avoids  potential  confusion  over  average  versus  sustainable  rates. 

Other  interesting  properties  come  up  in  the  delay  plot.  For  example,  timesharing 
between  the  two  single-user  beamforming  strategies  does  not  result  in  a  convex  com¬ 
bination  of  delay  pairs,  as  it  would  with  rate,  but  rather  in  the  rectangular-shaped 
curve  shown  as  the  dotted  line  in  the  figure.  This  timesharing  therefore  only  results 
in  less  desirable  delay  pairs.  When  multiplexing  more  than  two  streams,  a  higher- 
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dimensional  plot  can  capture  all  of  the  achievable  delay  i^-tuples. 


4.6  Multiplexing  Different  Classes  of  Data 


We  have  explored  different  scheduling  strategies  based  on  a  few  general  levels  of 
delay  tolerance.  For  this  to  be  useful  in  many  realistic  systems,  these  ideas  should  be 
combined  into  a  single  framework  capable  of  dealing  with  a  mixture  of  data  classes, 
or  perhaps  even  a  continuum  of  priorities.  Additionally,  this  system  would  ideally 
be  more  amenable  to  including  more  concrete  servive  guarantees.  In  this  section,  we 
give  some  ideas  on  the  direction  such  an  effort  may  take,  inspired  by  the  previously- 
mentioned  weighted  fair  queueing  algorithm  [53,  16]. 

Although  a  straightforward  application  of  weighted  fair  queuing  to  spatial  mul¬ 
tiplexing  would  not  take  proper  account  of  the  physical  channel,  it  does  provide  a 
starting  point  for  incorporating  different  data  priorities  and  rate  guarantees.  Given 
a  constant-rate  data  channel  and  set  of  weights  (pk  on  the  streams,  this  algorithm 
attempts  to  guarantee  stream  k  a  fraction 

4>k 

Y.i4>i 

of  the  overall  rate,  where  the  summation  is  over  all  streams  that  have  data  to  send.  For 
packet-based  serial  transmission,  Parekh  and  Gallager  [53]  describe  a  a  “virtual  time” 
implementation  that  guarantees  that  no  packet  will  be  delayed  from  a  continuous-flow 
ideal  by  more  than  the  largest  packet  length.  Suppose  that  a  packet  of  length  q 
arrives  from  stream  k  at  virtual  time  This  packet  is  given  the  timestamp 


ffc,0  + 


(4.4) 


which  specihes  the  hnishing  virtual  time.  If  the  queue  already  contains  a  packet  from 
this  stream,  then  tkfl  in  (4.4)  is  replaced  by  the  timestamp  of  the  earlier  packet. 
Packets  are  serviced  in  increasing  order  of  timestamp. 

To  apply  this  idea  to  spatial  multiplexing  over  fading  channels,  we  interpret  (4.4) 
and  then  extend  it  to  this  new  context.  The  timestamp  can  be  seen  as  weighting  the 
time  it  will  take  to  send  the  packet  by  <pk  and  giving  a  credit  for  time  spent  waiting 
in  the  queue.  These  ideas  can  also  apply  to  our  fading  channel  model,  although  we 
lose  some  of  the  strict  quality-of-service  guarantees.  If  each  transmission  segment  is 
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the  same  length  of  time  (but  constains  a  different  number  of  bits),  let  4,o  be  the  time 
segment  number  in  which  an  entry  arrives  in  the  queue.  Multiple  submissions  are  not 
an  issue  because  we  allow  a  stream  to  submit  only  one  entry  at  a  time.  Next,  let  L 
be  some  constant,  perhaps  a  threshold  number  of  bits  that  must  be  buffered  before 
a  stream  can  send  an  entry  to  the  queue.  To  establish  a  “hnishing  time”  for  this 
packet,  we  need  to  divide  by  the  rate  at  which  data  will  be  sent,  taking  into  account 
the  channel  state  and  the  other  receivers  to  be  spatially  multiplexed.  With  precoding 
(and  no  power  control),  this  rate  can  be  computed  from  the  channel  realization  and 
previously  selected  streams.  Under  beamforming,  the  system  can  estimate  the  value 
based  on  this  information.  After  normalizing  the  first  term  in  (4.4)  with  respect  to 
the  current  time  t,  we  thus  select  the  receiver  with  the  smallest 


(4,0  ~  f)  + 


L 

rk{t)(l)k 


(4.5) 


This  again  represents  a  weighted  sum  between  the  time  spent  in  the  queue  and 
the  potential  rate.  Delay-tolerant  streams  will  set  a  relatively  small  (pk  so  that  they 
will  be  transmitted  only  when  performance  is  very  high  or  when  the  system  is  not 
very  busy.  Streams  that  are  more  delay  constrained  will  set  a  larger  0^  so  that  they 
will  not  have  to  wait  very  long,  even  if  the  channel  is  not  very  strong.  Note  how  this 
scheme  allows  for  a  continuum  of  delay  tolerances,  rather  than  just  a  discrete  number 
of  classes.  However,  it  will  require  calibrating  the  weights  to  achieve  a  proper  balance 
between  the  two  terms  in  (4.5). 

The  High  Data  Rate  (HDR)  mode  in  the  CDMA  2000  wireless  standard  [69]  and 
a  related  system  for  transmitter  arrays  [74]  include  many  of  the  same  issues  for  their 
timesharing-based  systems.  These  systems  attempt  to  schedule  each  stream  near  its 
peak  channel  quality  while  providing  “proportional  fairness”  that  channels  with  higher 
average  quality  do  not  receive  more  than  their  share  of  timeslots.  Mechanically,  they 
penalyze  for  data  recently  sent  (rather  than  crediting  for  time  spent  in  the  queue) 
and  maximizing  on  the  rate  (rather  than  minimizing  on  its  inverse).  A  version  of 
this  form  of  weighting  could  be  formulated  for  our  spatial  multiplexing  setup,  though 
again  the  higher  total  rates  may  come  at  the  expense  of  some  guarantees. 

Either  of  these  two  directions,  inspired  by  weighted  fair  queuing  and  HDR,  or 
a  more  direct  composite  of  the  strategies  from  Sections  4. 3-4. 5  could  serve  as  the 
foundation  for  a  scheduling  algorithm  that  is  more  integrated  across  different  types 
of  data.  The  two  ideas  discussed  in  this  section  would  result  in  a  smoother  distri- 
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bution  of  delay  times  than  the  moderate-delay  queuing  scheme  of  Section  4.3.2,  and 
may  provide  an  easier  base  on  which  more  complex  networking-oriented  algorithms 
could  be  developed.  For  instance,  to  decouple  the  delay  and  rate  priorities,  it  may 
be  possible  to  add  in  some  of  the  ideas  from  service  curves  [15,  60].  On  the  other 
hand,  the  schedulers  from  the  bulk  of  this  chapter  dealt  more  directly  with  the  ap¬ 
propriate  optimization  criteria  for  each  data  type.  Due  to  the  random  nature  of 
the  channel,  any  algorithm  will  have  a  hard  time  providing  strict  quality-of-service 
guarantees.  However,  with  enough  potential  receivers,  the  previously-discussed  ro¬ 
bustness  of  scheduling  suggests  that  well-designed  algorithms  may  be  likely  to  achieve 
reasonable  goals  in  practice. 
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Chapter  5 


Multicasting  of  Common 
Information 


We  now  add  a  deeper  consideration  at  the  scheduling  and  array  processing  blocks  of 
whether  data  streams  are  intended  for  single  or  multiple  receivers.  Previously,  we 
assumed  that  the  scheduler  would  simply  duplicate  any  streams  that  had  multiple 
recipients.  However,  it  would  seem  that  in  such  multicast  scenarios,  it  may  be  more 
efficient  to  transmit  data  only  once  rather  than  repeating  it  in  this  way.  The  draw¬ 
back  is  that  the  transmitter  must  now  satisfy  the  goals  of  all  the  recipients  of  this 
stream  simultaneously.  Therefore,  an  investigation  into  the  potential  performance 
and  implementations  of  multicasting  is  needed. 

To  facilitate  analysis,  we  hrst  consider  the  array  processing  of  a  single  stream  in 
isolation,  and  later  describe  how  to  incorporate  these  ideas  into  the  larger  system 
context.  A  useful  exercise  here  is  to  consider  the  two  extremes  where  the  stream  is 
intended  for  a  single  receiver  or  for  all  possible  receivers.  In  the  hrst  case,  we  have  seen 
that  an  optimal  strategy,  implementable  by  beamforming,  is  to  ensure  that  the  signals 
from  the  different  antenna  elements  coherently  combine  at  the  receiver.  In  the  latter 
case,  the  transmitter  can  not  effectively  make  use  of  its  channel  information  since 
the  data  must  be  received  at  all  possible  locations.  This  presents  a  good  application 
for  space-time  codes  that  do  not  take  into  account  any  channel  side  information  that 
the  transmitter  may  have.  Using  our  usual  independent  Rayleigh  fading  model,  the 
two  extremes  result  in  the  same  shape  of  the  received  SNR  distribution,  but  with  a 
factor  of  M  difference  in  magnitude,  where  M  is  the  number  of  transmitter  antenna 
elements. 


125 


A  multicast  scenario  is  concerned  with  what  happens  in  between,  when  a  stream 
is  directed  to  a  hnite  number  of  receivers.  This  leads  to  two  fundamental  questions: 
Where  does  the  performance  fall  within  the  spectrum  of  possibilities  given  above? 
What  transmission  schemes  are  optimal  or  most  useful  is  these  cases?  To  even  begin  to 
answer  these  questions  requires  a  more  precise  concept  of  performance,  since  schemes 
that  are  good  for  some  receivers  may  not  be  good  for  others.  In  Section  5.1,  we 
provide  such  a  discussion  of  performance  and  efficient  operating  points,  setting  up 
the  analysis  for  the  rest  of  the  chapter. 

We  then  examine  techniques  for  different  regimes  and  types  of  signaling.  Beam¬ 
forming  strategies,  analyzed  in  Section  5.2  and  Section  5.3,  are  most  useful  when  the 
number  of  receivers  is  small  or  when  the  transmitter  can  signal  over  many  channel 
variations.  For  other  scenarios,  a  potentially  more  complex  class  of  schemes,  of  which 
beamforming  is  a  subset,  may  be  necessary.  We  investigate  properties  and  implemen¬ 
tations  of  this  more  general  class,  which  we  refer  to  as  space-time  multicast  coding, 
in  Section  5.4.  In  one  example  with  an  eight-element  array  and  eight  receivers,  they 
achieve  up  to  a  6  dB  gain  over  ordinary  space-time  codes  that  do  not  incorporate 
channel  knowledge. 

Finally,  we  connect  multicasting  back  to  the  larger  system  point  of  view  in  Sec¬ 
tion  5.5.  In  many  cases,  it  is  possible  to  transmit  several  multicast  streams  simulta¬ 
neously,  and  these  can  be  sent  alongside  receiver-specihc  streams. 


5.1  Overview  of  Multicast 

At  the  heart  of  multicast  is  an  attempt  to  satisfy  the  goals  of  a  number  of  receivers 
using  a  single  transmission  strategy.  Because  the  receivers  will  experience  distinct 
realized  channel  vectors,  with  correspondingly  distinct  optimal  strategies,  selecting 
the  multicast  parameters  often  requires  a  balance  among  conflicting  objectives. 

A  related  issue  shows  up  when  separate  streams  are  directed  to  different  receivers. 
For  this  scenario  (often  called  a  broadcast  channel),  researches  in  information  theory 
have  long  used  the  concept  of  rate  regions,  which  describe  all  achievable  rate  K- 
tuples  to  the  K  receivers.  Without  arrays,  superposition  coding  [14]  or  dirty-paper 
coding  [13,  10,  84]  is  sufficient  to  achieve  all  points  in  the  rate  region,  while  the 
region  is  not  completely  known  when  transmitting  from  an  array.  Earlier  in  this 
thesis,  we  described  different  array  processing  methods  for  various  types  of  tradeoffs 
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among  receivers.  As  we  begin  to  consider  sending  common  information,  we  encounter 
important  differences  from  these  situations.  For  example,  with  multicast,  interference 
is  no  longer  an  issue;  while  with  multiplex,  the  transmitter  can  do  more  optimization 
on  the  signaling  to  different  receivers.  These  details  will  lead  to  different  tradeoffs, 
though  many  of  the  same  fundamental  concepts  will  appear. 

For  a  given  fading  channel  realization,  consider  the  following  hypothetical  tool 
for  capturing  the  benehts  of  different  multicast  strategies.  Imagine  a  graphical  plot 
with  separate  axes  for  each  of  the  K  intended  receivers,  denoting  some  appropriate 
measure  of  performance.  This  might  be  SNR  for  an  uncoded  system  or  rate  for  a  coded 
one.  Then,  for  a  given  fading  channel  realization,  every  transmission  strategy  would 
correspond  to  a  iF-dimensional  point  in  “performance  space.”  Once  all  admissible 
strategies  (or  strategies  of  a  given  type)  have  been  plotted,  the  various  tradeoffs 
among  the  different  receivers  should  become  clear  and  the  transmitter  can  select  an 
appropriate  operating  point  based  upon  system  goals.  By  repeating  this  procedure 
across  many  fading  channel  realizations,  one  could  also  compute  statistics  over  the 
random  ensemble.  In  this  way,  decisions  can  be  made  based  on  individual  receiver  or 
system-wide  goals,  outage  or  ergodic  capacity  measures. 

In  the  execution  of  this  plan,  care  must  be  taken  to  ensure  that  the  performance 
characterization  is  well-dehned.  The  relationship  between  SNR  and  uncoded  perfor¬ 
mance  may  only  be  clear  in  certain  specialized  cases,  such  as  an  additive  Gaussian 
noise  channel.  For  a  coded  system,  the  transmitter  must  choose  codewords  at  a  par¬ 
ticular  rate,  even  though  different  receivers  may  have  the  potential  to  reliably  receive 
a  range  of  rates.  Therefore,  a  simple  rate  region  interpretation  is  not  sufficient.  We 
will  address  these  concerns  with  careful  dehnitions  and,  at  times,  special  cases. 

With  coded  transmission,  we  resolve  the  issue  by  dehning  the  performance  axes 
in  terms  of  mutual  information  rather  than  capacity.  Given  a  particular  input  dis¬ 
tribution  and  channel  realization,  this  mutual  information  can  be  computed  for  each 
receiver  and  represents  the  maximum  reliable  rate  of  communication  over  that  link. 
The  achievable  region  then  takes  on  different  interpretations  depending  on  the  type 
of  signaling  used: 

•  If  the  transmitter  signals  over  a  particular  fading  realization  at  coded  rate  R, 
any  receivers  with  mutual  information  of  at  least  R  for  this  signaling  scheme 
will  be  able  to  reliably  decode  the  data.  Timesharing  over  different  strategies 
(within  the  same  channel  realization)  is  also  possible. 
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•  If  the  transmitter  signals  over  an  ergodically  varying  channel,  it  can  use  a  code 
with  rate  R  and  interleave  symbols  over  all  channel  realizations.  Any  receiver 
with  expected  mutual  information  of  at  least  R  will  be  able  to  reliably  decode 
the  data.  (To  get  this  ergodic  behavior,  the  schemes  used  at  each  realization 
must  employ  the  same  input  distribution  [31,  46],  but  we  will  see  that  this  is 
automatically  satished  for  our  encoding  schemes.) 

Looking  at  the  above  descriptions,  two  particular  strategies  stand  out.  For  a  sin¬ 
gle  channel  realization,  one  could  maximize  the  minimum  mutual  information  among 
receivers  and  therefore  achieve  the  highest  rate  that  all  can  reliably  decode.  Alterna¬ 
tively,  for  ergodic  signaling,  one  could  maximize  the  sum  of  rates  among  receivers  at 
each  realization.  This  strategy  then  maximizes  the  rate  of  common  information  if  all 
receivers  undergo  i.i.d.  channel  variations  over  the  same  fading  distribution.  We  will 
discuss  other  operating  points  of  interest  throughout  this  chapter  as  well. 

Our  first  investigation,  however,  will  be  over  the  particular  subset  of  transmission 
schemes  corresponding  to  beamforming.  These  perform  well  for  small  numbers  of 
receivers  (e.g.,  they  are  optimal  for  transmitting  to  two  receivers,  as  we  will  show 
in  Section  5.4.2)  or  with  ergodic  capacity  goals.  Furthermore,  they  lead  to  low- 
complexity  transmission  and  reception  techniques  and  are  compatible  with  both  coded 
and  uncoded  modulation.  In  Section  5.4,  we  will  return  to  the  more  general  scenario 
and  discuss  optimal  strategies  and  useful  implementations. 

5.2  Operating  Points  for  Beamforming 

In  this  section,  we  consider  multicast  solutions  where  the  transmitter  using  a  beam¬ 
forming  strategy  to  send  identical  information  to  K  receivers.  The  vector  of  antenna 
element  outputs  consists  of  a  single  input  symbol  multiplied  by  a  vector  of  weights 
g,  resulting  in  coherent  combining  at  some  potential  location.  Each  receiver  gets  a 
scaled  copy  of  the  input  stream  plus  noise,  so  received  SNR  is  a  valid  measure  for 
uncoded  performance  (or  log2(l  -|-  SNR)  for  mutual  information  in  coded  systems). 

As  described  in  the  previous  section,  when  the  antenna  weights  g  are  selected, 
there  is  an  associated  point  (SNRi,  SNR2, . . . ,  SNR^)  in  K  dimensional  “SNR-space” 
that  describes  the  associated  SNRs  experienced  at  the  receivers  in  the  system.  More¬ 
over,  given  the  transmitter  power  constraint,  there  is  a  well-dehned  surface  that 
dehnes  the  boundary  of  those  points  that  are  attainable.  We  refer  to  this  frontier  of 
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achievable  points  as  the  “transmitter  operating  characteristic”  (TOC)  for  the  real¬ 
ized  channel  and  power  constraint.  As  will  become  apparent,  a  transmitter  operates 
efficiently  if  and  only  if  it  results  in  an  SNR  vector  lying  on  the  TOC. 

Using  K  =  2  receivers  for  illustration,  the  TOC  can  be  described  as  the  set  of 
received  SNR  pairs  (SNRi,SNR2)  for  which  SNRi  is  maximized  subject  to  various 
thresholds  on  SNR2.  This  frontier  is  equivalently  traced  out  by  maximizing 

cqSNRi  +  a2SNR2  (5.1) 

with  various  nonnegative  weights  ai  and  0:2,  again  subject  to  the  system  power  con¬ 
straint.  In  enumerating  points  on  the  TOC,  also  note  that  it  is  only  useful  to  send 
energy  in  a  direction  in  the  span  of  hi  and  h,2. 


It  can  be  shown  from  the  characterization  of  the  TOC  that  the  two  pairs. 
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which  correspond  to  beamforming  directly  to  each  of  the  hrst  and  second  receivers,  re¬ 
spectively,  must  lie  on  the  TOC.  This  follows  because  one  of  the  receivers  experiences 
the  maximum  possible  SNR  in  each  case. 


Returning  to  the  weighted  sum  of  SNRs  formulation  of  (5.1),  these  two  points 
correspond  to  0:2  =  0  or  ai  =  0.  On  the  other  hand,  when  ai  =  0:2,  we  wish  to  hnd 
a  beamforming  vector  g  to  maximize  the  sum  of  SNRs, 


which  is  equivalent  to 
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This  is  the  well-known  matrix  norm  problem,  which  is  solved  by  performing  a  singular 
value  decomposition. 
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Figure  5-1:  The  curve  shown  describes  a  typical  frontier  of  achievable  received  SNR 
pairs  when  transmitting  common  information  from  an  8-element  array  to  two  re¬ 
ceivers,  at  an  input  SNR  per  link  of  5  dB.  SNR  pairs  are  achievable  if  and  only  if 
they  lie  on  or  inside  this  transmitter  operating  characteristic  (TOC).  Various  operat¬ 
ing  points  of  interest  are  also  shown. 


and  letting  g  be  the  hrst  column  of  V.  Then 

SNRfc  =  lufcyl^Amax^,  (5.3) 

■/Vo 

where  A^ax  is  the  largest  eigenvalue  of  H^H,  or  equivalently  the  square  of  the  largest 
singular  value  of  H,  and  Uk,i  is  the  kth  entry  of  the  hrst  column  of  U.  For  any 
other  combination  of  (q;i,q;2)  weights,  the  same  procedure  can  be  performed  after 
hrst  premultiplying  H  by  diag(ydri,  ^/^)■ 

The  TOC  curve  for  a  particular  channel  realization  and  power  constraint  is  de¬ 
picted  in  Fig.  5-1.  Note  that  the  SNRs  are  plotted  in  normal  units  rather  than  in  dB. 
This  will  aid  in  geometric  interpretations  and  properties;  for  example,  the  two-receiver 
achievable  region  for  beamforming,  when  plotted  in  this  way,  is  always  convex.  We 
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will  prove  this  statement  and  discuss  its  implications  in  the  later  discussion  on  opti¬ 
mality  (in  Section  5.4.2).  For  now,  this  says  that  timesharing  among  two  operating 
points  on  the  TOC  can  not  improve  the  time-average  SNR. 

By  its  location  on  the  boundary  of  attainable  SNR  pairs,  any  point  on  the  TOC 
represents  a  strategy  where  the  transmitter  is  operating  efficiently.  To  select  among 
them,  it  is  up  to  the  system  designer  to  supply  a  particular  performance  criterion 
that  is  appropriate  to  the  given  application. 

Many  operating  points  of  interest  can  be  developed  from  the  TOC.  The  two  cir¬ 
cles  ‘O’  correspond  to  the  single-user  beamforming  points  (5.2).  To  maximize  the 
minimum  performance  among  receives,  discussed  in  the  previous  section  as  a  way  to 
ensure  that  both  receivers  achieve  sufficient  quality,  one  operates  at  the  intersection 
of  the  TOC  with  the  line  SNRi  =  SNR2;  in  Fig.  5-1  this  point  is  indicated  via  the 
symbol  ‘v’-  In  scenarios  where  the  line  does  not  intersect  the  solid  TOC  curve,  we 
operate  at  the  nearest  of  the  points  (5.2).  In  other  cases,  maximizing  the  average  (or, 
equivalently,  total)  SNR  over  all  receivers  is  more  appropriate.  This  is  achieved  by 
operating  at  the  point  where  the  TOC  has  slope  —1;  in  Fig.  5-1  this  point  is  indicated 
via  the  symbol  ‘<0’,  and  corresponds  to  weights  in  (5.1)  satisfying  ai  =  a2-  A  similar 
operating  point  can  be  found  for  maximizing  the  sum  of  mutual  information  across 
receivers  by  regraphing  the  TOC  in  terms  of  log2(l  -|-  SNR);  we  saw  how  this  is  useful 
for  maximizing  ergodic  capacity  when  the  transmitter  can  code  across  many  channel 
realizations. 

Operating  points  other  than  the  max- min  point  (‘ v’)  useful  even  when  signal¬ 

ing  over  individual  channel  realizations.  In  some  applications  like  voice  transmission, 
one  receiver  may  have  higher  fidelity  requirements  than  the  other.  Other  times,  it 
may  be  important  that  information  gets  across  to  one  receiver  very  quickly.  In  these 
cases,  after  the  data  is  sent  at  a  high  rate  that  the  first  receiver  can  understand, 
additional  symbols  can  be  sent  (perhaps  along  a  different  beamforming  direction)  to 
the  second  receiver.  This  is  the  idea  behind  Shulman  and  Feder’s  static  broadcasting 
[58],  which  was  developed  in  a  very  general,  information  theoretic  model.  A  practical 
implementation  may  include  the  use  of  rate-compatible  punctured  codes  [35].  First, 
a  high-rate,  punctured  code  is  transmitted  that  the  first  receiver  can  decode.  Then 
the  missing  bits  are  sent,  which  combine  with  the  first  set  to  form  the  lower- rate  code 
for  the  second  receiver.  These  rate-compatible  codes  sacrifice  very  little  optimality 
over  the  best  known  codes  of  the  same  rates  (at  least  as  of  the  publication  of  [35]). 


131 


5.3  Maximizing  Average  SNR  per  Receiver 


We  now  proceed  to  answer  some  quantitative  questions  about  the  performance  of  mul¬ 
ticast,  using  the  operating  point  that  maximizes  the  sum  of  SNRs  to  the  K  receivers. 
This  point,  achievable  with  beamforming,  is  amenable  to  analysis  and  gives  an  upper 
bound  on  the  average  per-receiver  SNR.  In  this  way,  it  provides  information  on  where 
multicast  scenarios  may  fall  between  the  extremes  of  single-receiver  transmission  and 
communication  with  all  possible  receivers.  Throughout,  we  assume  that  all  receivers 
have  the  same  Rayleigh  fading  distribution. 

This  discussion  also  serves  to  illustrate  advantages  and  disadvantages  of  beam¬ 
forming  strategies.  If  the  channel  coefficients  undergo  independent,  ergodic  channel 
variations  then  the  operating  point  under  consideration  maximizes  the  time-average 
SNR  among  all  receivers.  This  offers  a  low-complexity  approximation  to  maximizing 
the  common  ergodic  capacity  among  receivers  and,  as  we  will  see,  achieves  signihcant 
gains  over  scenarios  where  the  transmitter  does  not  have  channel  knowledge.  On 
the  other  hand,  when  signaling  over  a  single  fading  realization,  these  strategies  often 
provide  some  receivers  with  very  good  performance  at  the  expense  of  others.  This 
makes  the  outage  characteristic  degrade  rapidly  as  more  receivers  are  added. 


5.3.1  Average  Performance  Per  Receiver 

We  begin  by  investigating  average  SNR  per  receiver,  without  regard  to  how  perfor¬ 
mance  is  actually  distributed  among  the  different  receivers.  This  will  help  characterize 
the  potential  of  multicasting,  and  in  particular  the  value  of  using  channel  information 
available  at  the  transmitter  as  the  number  of  receivers  grows. 

The  properties  of  interest  can  be  derived  by  analyzing  the  eigenvalues  of  certain 
random  matrices.  To  achieve  the  maximum  sum  of  SNRs,  the  beamforming  vector 
g  is  set  to  the  eigenvector  corresponding  to  the  maximum  eigenvalue  of  H^H.  The 
average  SNR  per  receiver  then  scales  with  the  largest  eigenvalue. 


Average  SNR  per  receiver 


A^ax  (H^H)  V 
K  'Mo 


(5.4) 


We  then  exploit  that  with  Rayleigh  fading,  the  matrix  H^H  has  a  complex  Wishart 
distribution  [48]  when  K  >  M;  when  K  <  M  it  is  the  matrix  HH^  that  is  Wishart 
distributed. 
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When  the  number  of  antennas  M  and  receivers  K  are  moderate  to  large,  we  can 
take  advantage  of  asymptotic  properties  of  Wishart  matrices.  It  can  be  shown  that 
when  M  and  K  approach  inhnity  in  such  a  way  that  the  ratio  M/K  of  transmitter 
antenna  elements  per  receiver  approaches  a  positive  constant,  then  the  largest  eigen¬ 
value  of  the  associated  Wishart  matrix  converges  almost  surely  [28,  18],  resulting 
in 


Average  SNR  per  receiver 


(5.5) 


This  asymptotic  behavior  is  shown  by  the  solid  curve  in  Fig.  5-2,  from  which  we  see 
that  the  SNR  growth  is  effectively  linear  in  the  numbers  of  antenna  elements/receiver 
ratio  M/K  for  moderate  to  high  ratios.  Moreover,  when  the  number  of  antenna 
elements  M  is  significantly  larger  than  the  number  of  receivers  K,  there  is  a  gain  of 
approximately  3  dB  in  SNR  for  every  doubling  of  M.  We  stress  that  the  limit  in  (5.5) 
is  no  longer  random,  but  rather  a  deterministic  result  for  all  channel  realizations. 


It  is  also  worth  emphasizing  that  a  ratio  of  M/ K  =  0  means  that  M  grows  much 
more  slowly  than  K,  i.e.,  M  =  o{K).  A  special  case  corresponds  to  using  a  fixed 
number  of  transmit  antenna  elements  M  while  allowing  the  number  of  receivers  K  to 
increase  to  inhnity.  Because  a  transmitter  can  not  effectively  tailor  a  beamforming 
strategy  to  a  very  large  number  of  receivers,  it  is  not  surprising  that  this  ratio  leads 
to  an  average  value  of  V /Mo,  the  same  as  if  channel  information  were  not  available. 


Also  shown  in  Fig.  5-2  are  expected  values  for  representative  scenarios  involving 
antennas  with  hnitely  many  elements  and  hnite  receiver  populations  (using  Monte 
Carlo  simulations).  As  the  plot  rehects,  the  asymptotic  behavior  of  (5.5)  is  approxi¬ 
mated  reasonably  closely  for  even  moderate  values  of  M  and  K. 


For  hnite  values  of  M  and  K,  the  average  SNR  per  receiver  is  a  random  variable 
whose  value  depends  on  the  realized  channel.  If  more  accurate  performance  statistics 
for  this  random  distribution  are  desired,  it  is  possible  to  calculate  the  probability 
distribution  of  the  possible  values  the  SNR  may  take  on.  In  particular,  the  joint 
distribution  of  all  the  eigenvalues  Aj  of  a  Wishart  matrix  H^H,  where  H  has  i.i.d. 
Gaussian  entries  of  variance  one,  is  [18] 


nfk  r(A'  -  i  + 1)  r(M  -  *  + 1) 


(5.6) 
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Figure  5-2:  Expected  average  SNR  per  receiver  for  various  values  of  M/K  and  an 
input  SNR  per  link  of  0  dB.  The  solid  line  shows  the  deterministic  asymptotic  values 
when  both  M  and  R'  go  to  cx)  with  the  ratio  M/K  held  hxed.  The  dashed  curves 
denote  representative  points  corresponding  to  hnite  M  and  K  for  K  =  A  (‘<C>’)  and 
K  =  8  (‘v’))  froKi  simulations. 


where 


denotes  the  usual  Gamma  function.  Following  Edelman  [18] ,  the  density  of  the  largest 
eigenvalue  can  be  computed  by  integrating  over  all  but  one  of  the  A,,  and  dividing 
by  (M  —  1)!  to  remove  the  arbitrary  ordering  of  the  eigenvalues.  When  M  =  2,  the 
resulting  probability  density  for  the  largest  eigenvalue  is 


/a(A)  = 
where 


e-^A^-2  [A^e-^  -  +  (A^  +  (K  -  1){K  -  2\))-f{K  -  1,  A)' 


(r:-  1)!(R:-2)! 


7(6,  a)  =  I  t'^  ^  e  ^  dt 


(5.7) 

(5.8) 
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is  the  incomplete  Gamma  function.  From  these  probability  functions  and  (5.4),  it  is 
possible  to  numerically  calculate  detailed  average  SNR  statistics  over  the  ensemble  of 
possible  channel  realizations. 


5.3.2  Individual  Receiver  Performance 


While  the  average  SNR  per  receiver  may  be  a  useful  characterization  of  overall  system 
performance,  it  does  not  reflect  the  behavior  experienced  by  any  individual  receiver 
in  the  system.  In  this  section,  we  focus  on  the  individual-receiver  outage  and  ergodic 
capacity. 

To  determine  the  distribution  of  an  individual  receiver’s  SNR  under  maximum 
sum  of  SNRs  beamforming,  we  begin  by  repeating  (5.3): 

2  ^ 

SNR/j  |‘Wfc,l|  -^max  1  r  ■) 

J\lo 

where  Amax  is  the  largest  eigenvalue  in  a  Wishart-distributed  matrix.  Also,  Uk^i  is  an 
entry  from  the  random  circular  unitary  matrix  U  from  the  singular  value  decompo¬ 
sition  of  H .  The  probability  density  of  is  (see,  e.g.,  [50]) 


/|«fc,ip(F) 


(AT- 1)(1 0</i<l, 
0  otherwise. 


(5.9) 


The  marginal  distribution  for  SNR^  can  then  be  computed  since  random  variables 
Amax  and  \uk,i\‘^  are  independent  —  the  principal  eigenvector  of  HH^  has  no  preferred 
direction  [19].  In  the  limiting  case  oi  K  ^  oo  and  M  hnite,  it  is  straightforward  to 
verify  that  SNR^  has  the  same  exponential  distribution  as  for  a  beamforming  strategy 
that  ignores  side  information.  In  principle,  the  SNR  distribution  can  be  computed 
analytically  for  any  number  of  antennas  or  receivers.  These  computations  quickly 
become  very  cumbersome,  however,  so  in  the  discussion  below,  we  plot  results  from 
simulations. 

Maximum  sum  of  SNRs  beamforming  performs  well  when  signaling  over  many 
fading  realizations.  In  Fig.  5-3,  we  plot  the  ergodic  capacity  (equal  for  all  receivers) 
when  multicasting  from  an  8-element  array  to  as  many  as  twenty  receivers.  For  com¬ 
parison,  we  also  plot  the  performance  of  an  ideal  space-time  code  that  does  not  take 
channel  information  into  account.  Note  that  at  this  input  SNR  level,  the  transmitter 
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Figure  5-3:  Single-user  ergodic  capacity  when  multicasting  a  stream  from  an  8-element 
array  to  a  number  of  receivers,  at  an  input  SNR  per  link  of  5  dB.  Also  shown  is  a 
curve  for  a  space-time  code  that  does  not  make  use  of  channel  knowledge  and  achieves 
received  SNR  =  ||h,|p/M  ■  V/Mq. 


can  communicate  with  twenty  receivers  simultaneously  at  a  higher  rate  than  is  avail¬ 
able  by  repeating  the  stream  to  two  receivers  separately  with  round-robin  scheduling 
(at  half  the  single-user  rate  for  each). 

The  outage  experienced  during  individual  channel  realizations  does  not  fare  as 
well.  In  Fig.  5-4,  we  see  that  the  outage  probability  for  maximum  sum  of  SNRs 
beamforming  degrades  considerably  as  more  receivers  are  added,  although  it  does 
remain  superior  to  transmission  from  a  single  antenna  element.  Other  beamforming 
solutions  may  do  somewhat  better,  but  the  outage  characteristic  will  still  suffer  as  the 
number  of  receivers  gets  large.  This  is  because  the  coherent  combining  that  occurs 
with  beamforming  also  induces  nulls  at  one  or  more  geographic  locations.  Therefore, 
beamforming  strategies  are  most  useful  when  the  number  of  receivers  is  fairly  small 
or  when  performance  is  averaged  across  many  channel  realizations. 
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Figure  5-4:  Single-user  outage  probabilities  when  multicasting  a  stream  from  an  8- 
element  array  to  a  number  of  receivers,  at  an  input  SNR  per  link  of  5  dB.  Also  shown 
are  curves  for  a  space-time  code  that  does  not  make  use  of  channel  knowledge. 

5.4  General  Space-Time  Multicast  Coding 

We  now  turn  to  more  general  transmission  schemes,  which  we  refer  to  as  space-time 
multicast  coding.  They  can  have  higher  complexity  than  beamforming,  but  are  able 
to  achieve  a  more  equitable  distribution  of  performance  among  the  different  receivers. 
Furthermore,  we  show  that  they  achieve  all  possible  operating  points  from  the  mutual 
information  point  of  view. 

5.4.1  Optimal  Structures 

In  space-time  multicast  coding,  the  outputs  at  the  different  antenna  elements  can 
be  described  using  an  arbitrary  covariance  matrix.  There  are  many  possible  imple¬ 
mentations,  but  the  structure  of  Fig.  5-5  is  particularly  useful  for  analysis.  The  data 
stream  is  encoded  to  produce  a  complex  Gaussian  sequence  of  coded  symbols  that 
is  i.i.d.,  zero- mean,  circularly  symmetric,  and  has  variance  V.  Such  encoders  appear 
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Figure  5-5:  Possible  structure  for  space-time  multicast  coding. 

often  in  the  information  theory  literature.  This  sequence  is  then  split  into  a  number 
of  parallel  sequences  and  then  undergoes  a  linear  transformation  described  by  an  ar¬ 
bitrary  matrix  G  to  produce  the  antenna  element  outputs.  Note  that  this  procedure 
reduces  to  beamforming  in  the  special  case  where  G  is  a  single  column  vector.  To 
satisfy  the  power  constraint,  we  impose  trace <  1. 

We  hrst  show  that  this  structure  is  sufficient  to  achieve  all  possible  operating 
points  for  a  coded  system,  and  then  go  on  to  describe  properties  and  interpretations. 

Proposition  3  Suppose  a  transmitter  sends  information  from  an  M-element  array 
to  K  receivers.  The  entire  frontier  of  efficient  operating  points,  in  terms  of  mutual 
information  K -tuples,  is  achievable  by  space-time  multicast  coding  as  described  in 
Fig.  5-5. 

To  prove  this,  note  that  space-time  multicast  coding  sends  a  zero-mean,  jointly  Gaus¬ 
sian  vector  of  antenna  element  outputs  x  with  covariance  Ta;  =  GG^P.  The  mutual 
information  at  receiver  k  is  then  equal  to  [50] 

h]j'xhk\ 

J 

Now  consider  any  other  scheme.  The  mutual  information  at  receiver  k  will  be 

I{yk]x)  =  H{yk)  -  H{yk\x) 

=  H{yk) -\og2{27rej\fo),  (5.11) 


(5.10) 


log2  1 
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where  H{-)  denotes  the  entropy  of  a  random  variable.  The  vector  of  antenna  element 
outputs  X  should  be  zero-mean,  because  any  power  that  is  used  in  the  mean  will 
not  contribute  to  the  mutual  information.  Using  the  Cholesky  factorization  of  the 
covariance  matrix  Fa;,  the  vector  x  can  always  be  written  as 


X  =  Ls, 


where  s  is  a  length-M  vector  of  uncorrelated  random  variables,  each  with  variance 
V,  and  T  is  a  lower-triangular  matrix  such  that  Fa;  =  LL^'V  and  trace  =  1. 

Receiver  /c’s  output  will  have  variance 

M 

^  +  -M), 

m=l 

where  Im  is  the  mth  column  of  L.  Among  all  random  variables  with  this  variance, 
the  entropy,  and  therefore  the  mutual  information  in  (5.11),  is  maximized  with  a 
Gaussian  distribution  [14].  This  can  be  achieved  with  an  i.i.d.  Gaussian  vector  s. 
Since  this  same  distribution  maximizes  the  mutual  information  for  all  receivers  (given 
a  particular  Fa;),  such  a  Gaussian  vector  is  optimal.  The  overall  system  then  becomes 
equivalent  to  space-time  multicast  coding. 


This  structure  is  also  optimal  when  coding  over  ergodic  variations  of  the  channel. 
When  the  optimal  input  distribution  is  equivalent  for  all  channel  realizations,  the 
maximum  achievable  rate  for  a  receiver  is  the  expected  value  of  mutual  information 
(shown  in  [46]  for  general  channels  and  applied  to  fading  channels  in  [31]).  For  the 
case  of  space-time  multicast  codes,  the  distribution  on  s  is  the  same  for  all  receivers 
and  all  channel  realizations.  Since  an  arbitrary  G  achieves  all  instantaneous  operating 
points,  we  also  achieve  all  operating  points  on  the  ergodically-varying  channel. 


Although  we  have  not  technically  dehned  the  set  of  mutual  information  iF-tuples 
as  a  rate  region,  we  should  still  double  check  whether  timesharing  can  expand  the 
region.  Gonsider  two  matrices  Gi  and  G2  used  for  space-time  multicast,  resulting  in 
mutual  information  vectors  log2(l  -|-  7i)  and  log2(l  -|-  72),  respectively.  If  the  hrst 
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scheme  is  used  a  fraction  (3  of  the  time,  we  get  through  timesharing, 


log2(l  +  7l,l) 

log2(l  +  72,1) 

log2(l  +  71,2) 

+  P-~P) 

log2(l  +  72,2) 

log2(l  +  7i,fe) 

log2(l  +  72,fe) 

/31og2(l  +  7i,i)  +  (1  -  /5)  log2(l  +  72,1) 

P  log2(l  +  71,2)  3-  {I  —  P)  log2(l  +  72,2) 

/31og2(l  +  7i,fc)  +  (1  -  /3)  log2(l  +  72, fc) 

However,  using  Jensen’s  inequality,  this  vector  is  the  same  or  inferior  for  every  receiver 

to  the  vector 

log2(l  +  /37i,1  +  (1  -  P)'l2,l) 

log2(l  +  P'11,2  +  (1  -  P)'l2,2) 

.  5 

log2(l  +  P'11, k  +  (1  ~  P)'l2,k) 

which  is  achievable  by  space-time  multicast  coding  with  the  matrix 


G 


,/PGi  ^/T^G2 


Similar  reasoning  shows  that  timesharing  between  more  than  two  points  does  not  add 
to  the  region,  either.  □ 


The  following  interpretation  of  space-time  multicast  coding  provides  a  connection 
to  the  SNR  operating  characteristic  described  earlier  in  Section  5.2.  Consider  the 
linear  transformation  in  Fig.  5-5  as  beamforming  each  of  the  parallel  sequences  along 
a  direction  corresponding  to  a  column  of  G.  Let  G  in  turn  be  written  as 


Cr  =  1^  aig^  a2fl'2  •  •  •  ocnQn  \  > 

where  are  unit-length  vectors,  i.e.,  =  1  for  all  n.  To  satisfy  the  power 
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constraint,  the  q;„’s  must  be  chosen  so  that 


N 

n=l 


<  1. 


The  mutual  information  to  the  kth  receiver  is  given  by  (5.10).  Note  that  the  second 
term  inside  the  logarithm  is 


hlTxhk 

A/q 


IldGfT. 


N 

n=l 


(5.12) 


This  is  equal  to  the  time-average  SNR  had  the  stream  been  transmitted  along  the 
columns  of  G  at  different  times  (with  each  being  used  a  fraction  of  time  janP)-  Unlike 
with  timesharing,  however,  the  coded  rate  corresponding  to  this  “equivalent  SNR” 
is  actually  achievable.  Beamforming  falls  out  as  a  special  case  when  the  covariance 
matrix  Ta;  has  rank  one. 

This  motivates  the  use  of  “equivalent  SNRs,”  such  as  (5.12),  as  a  convenient 
parametrization  for  mutual  information.  In  this  way,  the  performance  of  space-time 
multicast  codes  is  seen  as  a  kind  of  averaging  between  beamforming  strategies.  The 
achievable  region,  in  terms  of  equivalent  SNR  R'-tuples,  becomes  the  convex  hull 
of  the  beamforming  region.  It  is  important  to  keep  in  mind  that  for  higher-rank 
covariance  matrices,  the  equivalent  SNR  simply  represents  the  SNR  of  an  additive 
white  Gaussian  noise  channel  with  the  same  mutual  information,  and  in  general  is  not 
a  true  SNR  achievable  by  uncoded  systems.  Later,  we  will  develop  implementations 
that  are  more  amenable  to  uncoded  transmission. 

This  discussion  also  relates  to  the  theory  of  space-time  codes  that  do  not  in¬ 
corporate  channel  knowledge  at  the  transmitter.  Without  channel  information,  the 
transmitter  can  still  set  G  to  be  a  scaled  identity  matrix,  or  any  other  unitary  matrix, 
and  achieve  the  set  of  equivalent  SNRs 

It  also  maximizes  the  ergodic  capacity  in  these  situations  [63] .  Exact  implementations 
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would  be  complex,  so  various  space-time  codes  have  been  developed  to  approach  this 
performance  at  lower  complexity.  The  goal  of  (5.13)  is  achievable  for  space-time  block 
codes  designed  for  two  transmit  antennas  [1]  but  has  not  been  reached  for  larger  sizes 
[62].  It  also  shows  up  as  an  ideal  “matched  hlter  bound”  in  an  early  version  of 
space-time  trellis  coding  sometimes  called  delay  diversity  [78].  We  will  demonstrate 
space-time  multicast  code  implementations  that  approach  the  equivalent  SNR  to  the 
extent  that  these  other  types  of  designs  do. 

5.4.2  Beamforming  Versus  Higher-Rank  Covariances 

We  have  identified  beamforming  as  a  subset  of  space-time  multicast  codes  where  the 
covariance  of  the  antenna  outputs  has  rank  one.  Such  strategies  have  low  complexity 
and  are  compatible  with  most  types  of  coded  or  uncoded  modulation,  but  have  poor 
outage  characteristics  when  the  number  of  receivers  grows  large.  In  this  section,  we 
investigate  when  beamforming  is  sufficient  from  an  optimality  standpoint,  and  when 
higher-rank  covariances  are  necessary  to  achieve  certain  operating  points. 

Two  Antenna  Elements  or  Two  Receivers 

In  earlier  sections,  we  found  that  beamforming  strategies  work  well  when  the  number 
of  receivers  is  small.  Using  the  concept  of  equivalent  SNRs,  we  can  now  make  this 
statement  more  precise.  We  show  that  beamforming  is  entirely  sufficient  for  multi¬ 
casting  to  two  receivers,  and  then  look  at  where  it  breaks  down  as  the  number  of 
receivers  is  increased. 

Proposition  4  Suppose  a  transmitter  sends  information  from  an  M- element  array 
to  two  receivers.  Then  all  efficient  operating  points  can  he  achieved  using  a  rank-one 
covariance;  in  other  words,  by  a  beamforming  strategy. 

Space-time  multicast  coding  is  already  known  to  be  optimal.  Since  it  averages 
the  effective  SNR  for  each  receiver  over  several  beamforming  directions,  we  can  prove 
the  statement  above  by  showing  that  that  the  set  of  (SNRi,SNR2)  pairs  achievable 
by  beamforming  is  convex.  One  way  to  do  this  is  to  simply  enumerate  all  of  these 
points. 

First,  recall  that  the  transmitter  should  only  send  in  directions  that  are  in  the 
span  of  the  two  receivers’  channel  vectors;  components  outside  of  this  subspace  will 
simply  produce  nulls  at  the  receivers  and  waste  power.  Using  the  Gram-Schmidt 
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procedure,  this  space  can  be  parametrized  by  a  component  in  the  direction  of  the 
hrst  receiver’s  channel  vector,  and  a  second  component  orthogonal  to  this.  Therefore, 
the  M-antenna  problem  can  be  reduced  to  an  equivalent  two-antenna  problem  with 
lower-triangular  channel  matrix  and  channel  vectors 


If  the  transmitter  is  operating  at  the  power  constraint  =  1,  the  beamforming 
vector  g  can  be  parametrized  as 

cos  9 

9  =  ^ 

sm  9 

where  the  two  angles  9,  which  is  in  the  range  [0,7r/2),  and  0,  in  [0,27r),  produce 
the  relative  gain  and  phase.  We  can  then  enumerate  all  of  the  SNR  pairs  that  are 
attainable  by  beamforming: 

SNRi^  =  iLi^ipcos^^,  (5.14) 

SNR2^  =  \L2,i\‘^  cos^  9  +  \L2,2\‘^  sin^  9  +  |L2,i.h2,2|  sin(26')  cos(0  -  02),  (5.15) 

where  02  is  dehned  using 

L2,i4,2  =  1^2,1  4,2! 

Performance  is  clearly  maximized  by  choosing  0  =  02  so  that  the  hnal  cosine  term 
in  (5.15)  is  eqnal  to  one.  The  transmitter  operating  characteristic  curve  in  Fig.  5-1 
is  prodnced  by  following  this  trajectory  as  well  as  a  similar  one  when  the  ordering  of 
the  two  receivers  is  reversed.  The  solid  portion,  which  is  equivalent  to  maximizing 
a  (non-negative)  weighted  sum  of  SNRs,  is  the  intersection  of  these  two  curve.  Any 
point  in  SNR-space  that  is  inside  these  boundaries  can  be  achieved  by  transmitting 
below  the  power  constraint.  By  taking  second  derivatives  of  (5.14)-(5.15)  along  the 
boundary,  it  can  be  shown  that  the  overall  region  is  convex.  □ 

This  shows  that  beamforming  is  snfficient  in  the  two-receiver  case  with  any  number 
of  transmit  antenna  elements.  On  the  other  hand,  we  know  that  it  is  not  optimal  for  a 
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two-element  array  and  a  large  number  of  receivers.  To  understand  where  this  behavior 
changes,  the  method  above  can  be  extended  to  two  transmit  antenna  elements  and 
any  number  of  receivers.  Using  the  same  parametrization  as  before,  we  have 


hk  = 


Lk,i 

Lk,2 


k  =  2,3,...,K 


and 


SNRfc 


V 


\Lk, i\^  cos^  9  +  \Lk, 2\^  sin^  9  +  |Lfc,iTfc,2|  sin(26')  cos(0  -  (j)k), 
k  =  2,3,...,K. 


Consider  this  for  K  =  4  receivers.  The  three  (pk  are  parameters  of  the  realized  channel 
vectors.  Take  the  case  of  02  =  0,  03  =  2n/3,  and  04  =  47r/3.  Holding  9  constant 
and  alternating  between  0  =  0  and  0  =  vr  makes  all  three  cosine  terms  average 
to  zero,  while  it  is  impossible  to  make  all  of  them  simultaneously  nonnegative  for 
any  single  0.  Therefore,  the  equivalent  point  in  SNR-space  corresponding  to  this 
alternating  strategy  is  achievable  only  through  space-time  multicast  coding  with  a 
rank  two  covariance.  Beamforming  from  a  two-element  array  is  apparently  no  longer 
sufficient  when  there  are  four  or  more  receivers. 


Arbitrary  Numbers  of  Antennas  and  Receivers 

For  larger  systems,  parametrizations  of  all  beamforming  operating  points  such  as 
(5.14)-(5.15)  become  very  cumbersome.  However,  with  some  additional  geometric 
insight,  we  can  generalize  the  two  antenna  element  results  and  conjecture  that  for  M 
transmit  antenna  elements,  beamforming  becomes  suboptimal  when  transmitting  to 
2M  or  more  receivers.  This  is  done  with  essentially  a  dimension-counting  argument. 

In  general,  beamforming  with  an  M-element  array  requires  specifying  2M  real 
parameters:  the  individual  gains  and  phases  applied  to  the  different  antenna  inputs. 
For  SNR  purposes,  however,  there  are  really  only  2M  —  1  degrees  of  freedom,  because 
an  overall  phase  can  be  factored  out  without  affecting  performance.  This  implies  that 
the  region  achievable  by  beamforming  has  dimension  no  greater  than  2M  —  1. 

Space-time  multicast  coding  achieves  any  convex  combination  of  points  in  SNR- 
space  that  are  achievable  by  beamforming.  Mathematically,  this  is  a  convex  hull 
operation  [56].  If  we  can  show  that  the  resulting  region  has  a  higher  dimension  than 
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2M  —  1,  then  clearly  beamforming  is  not  sufficient.  One  way  to  do  this  is  to  show 
that  there  are  at  least  2M  linearly  independent  points  in  the  beamforming  region. 

We  conjecture  that  this  is  true  with  probability  one  when  the  number  of  receivers 
is  at  least  2M.  Consider  the  points  in  SNR-space  corresponding  to  the  K  single-user 
beamforming  directions.  When  beamforming  in  the  direction  of  the  /cth  receiver,  the 
vector  of  received  SNRs  is 

\hlhi\^ 

1  V 

; 

_  \hihK?  _ 

If  we  collect  these  vectors  into  a.  K  x  K  matrix,  its  rank  will  be  equal  to  the  number  of 
linearly  dependent  points  in  SNR-space  achieved  by  these  K  particular  beamforming 
directions. 

Multiplying  each  column  by  a  constant  will  not  change  the  rank,  so  we  therefore 
wish  to  hnd  the  rank  of  the  matrix 

BoB\ 

where 

B  =  HH\  (5.16) 

B*  represents  the  conjugation  (but  not  transpose)  of  B,  and  “o”  represents  the 
element-by-element  Hadamard  product.  Applying  a  singular  value  decomposition, 
B  =  the  Hadamard  product  above  can  be  taken  as  a  particular  submatrix  of 

{U  0  U*)  O  (S  ®  S*)  O  (yt  ^  v^), 

where  ®  represents  the  Kronecker  product  [38].  By  noting  that  B  is  Hermetian  (so 
that  U  =  V)  and  carefully  inspecting  the  individual  elements,  it  can  be  shown  that 

BoB*  =  B^Bl 
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where  -Bu  consists  of  all  possible  columns  of  the  form 


y/aiakU*  o  Uk. 


The  rank  of  the  overall  product  is  the  same  as  the  rank  of  Bu.  From  (5.16),  there  can 
be  at  most  M  nonzero  singular  values  <7^,  at  most  possible  nonzero  columns  of 
Bu,  and  therefore  the  maximum  overall  rank  is  min(iF,  M^).  Further  general  analysis 
appears  difficult,  but  our  simulations  and  analysis  of  special  cases  suggest  that  this 
maximum  rank  does  hold  true.  We  therefore  conjecture  that  when  K  >  2M  then 
with  probability  one,  there  are  at  least  2M  linearly  dependent  points  in  SNR-space, 
and  consequently  higher-rank  covariance  matrices  are  necessary  to  achieve  all  possible 
points  in  SNR-space. 


5.4.3  Implementation  Issues 

Multicasting  With  Arbitrary  Coding  and  Modulation 


The  points  achievable  by  higher-rank  covariances  are  in  general  not  available  for  ar¬ 
bitrary  signaling,  but  rather  are  equivalent  SNRs  for  rates  achieved  by  particular 
vector-coded  systems.  More  practical  implementations  may  take  their  inspiration 
from  existing  space-time  codes  that  are  adapted  to  take  advantage  of  channel  knowl¬ 
edge. 

For  example,  orthogonal  space-time  block  codes  can  easily  be  converted  to  use 
any  covariance  matrix.  These  codes  are  compatible  with  arbitrary  modulation  and 
scalar  coding  and  have  simple  detection  algorithms.  In  the  Alamouti  scheme  for  two 
transmit  antenna  elements  [1],  the  transmitter  sends  two  symbols,  s[l]  and  s[2],  over 
two  time  periods: 


Time  1  :  a;[l] 
Time  2  :  x[2] 


1 

0 

0 

1 


^[1]  + 

.*[!]- 


1 

0 


[2]. 


(5.17) 
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A  receiver  with  channel  vector  h  =  [hi  h2]^  gets 

y[l]  =  hls[l]  +  h*2s[2]  +  w[l] 
y[2]  =  h;s*[l]-hls*[2]+w[2]. 


The  receiver,  knowing  the  channel,  can  recover  the  input  symbols  by  taking  linear 
combinations  and  conjugations, 

s[l]  =  hiy[l]  + hly*[2] 
s[2]  =  h2y[l]-hly*[2], 

to  achieve  the  ideal  space-time  coding  SNR  of  \\h\\‘^V/2J\fo. 

The  orthogonal  signaling  vectors  in  (5.17)  are  used  because  the  channel  is  assumed 
not  to  be  known  at  the  transmitter.  However,  the  procedure  will  work  just  as  well 
with  arbitrary  vectors,  and  g2- 

Time  1  :  x[l]  =  +  g2s[2] 

Time  2  :  x[2]  =  -  giS*[2]. 

Now,  instead  of  being  sent  on  channels  h*  and  h^,  the  symbols  are  sent  on  h^g^  and 
h^g2-  The  rest  of  the  procedure  works  exactly  the  same  as  before  but  with  these 
substitutions,  achieving  the  received  SNR 

SNR  =  {Wh^g^f  +  \\h^g2\\‘^) 

If  9i=  92  =  ^/ll^ll)  fhsn  the  full  single-user  SNR  of  \\h\\^V/MQ  is  achieved.  For  mul¬ 
ticast  streams,  such  coherent  combining  will  usually  not  be  possible  for  all  receivers, 
so  distinct  vectors  gi  and  g^  will  be  used.  Also  note  that  it  is  possible  to  use  different 
power  distributions  among  the  two  g^  vectors  and  effectively  produce  any  weighted 
average  of  SNRs  between  the  two,  achieving  all  the  points  we  expect  from  space-time 
multicast  coding  that  uses  a  rank  N  =  2  covariance  (5.12). 

It  is  important  to  note  that  although  the  Alamouti  scheme  is  for  a  two-element 
antenna,  our  adapted  version  for  arbitrary  transmission  vectors  will  work  for  any 
number  of  antenna  elements,  as  long  as  we  only  wish  to  average  two  beamforming 
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directions.  In  this  way,  it  can  achieve  the  eqnivalent  SNR  of  space-time  multicast 
coding  with  any  rank-two  covariance  matrix,  but  now  with  arbitrary  modulation 
and  coding.  Other  orthogonal  space-time  block  codes  can  be  adapted  for  covariance 
matrices  above  rank  two  with  a  procedure  analogous  to  that  outlined  above.  Instead 
of  averaging  the  SNRs  over  the  channel  components  hi,  we  average  over  the  SNRs  of 
the  inner  products  h^Qi-  Unfortunately,  all  orthogonal  space-time  codes  of  this  type 
with  rank  higher  than  two  incur  a  rate  penalty  [62].  Still,  optimizing  the  beamforming 
directions  rather  than  simply  using  orthogonal  vectors  can  lead  to  significant  SNR 
improvement.  Other  techniques  such  as  space-time  trellis  codes  can  be  similarly 
converted  to  achieve  “diversity”  over  the  h^Qi- 


Finding  Operating  Points 

What  remains  is  a  method  for  finding  good  operating  points  for  space-time  multi¬ 
cast  coding.  We  concentrate  here  on  maximizing  the  minimum  performance  among 
receivers  for  each  channel  realization,  which  leads  to  the  highest  coded  rate  that  all 
receivers  can  understand.  Recall  that  outage-based  operating  points  such  as  this  are 
where  beamforming  strategies  are  weakest. 

In  general,  this  problem  represents  a  maximization  of  a  concave  function  over  a 
convex  set. 


max  min 
G:trace{G^G}<l  ^ 

implying  that  every  local  maximum  is  also  a  global  maximum  [56] .  This  suggests  that 
iterative  optimization  algorithms  might  be  useful.  Still,  the  convex  set  of  all  achievable 
points  is  rather  complicated,  which  might  make  an  exact  approach  difficult. 

This  becomes  more  tractable  if  broken  down  into  the  separate  problems  of  find¬ 
ing  unit-length  column  vectors  for  the  matrix  G  and  corresponding  weights  on  those 
vectors.  Given  a  set  of  unit  vectors,  the  convex  domain  is  polyhedral,  and  the  op¬ 
timization  can  be  converted  to  a  linear  programming  problem  that  can  be  solved 
with  the  simplex  method  [56,  37],  a  standard  linear  optimization  tool.  For  the  unit 
vectors  themselves,  we  will  find  that  the  single-user  beamforming  directions  lead  to 
good  results.  For  example,  for  K  receivers  and  the  three  received  SNR  vectors,  7^, 
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72,  and  73,  the  goal  is 


max  min  (Q;i7i,fc  +  0272, fc  +  a373,fc) , 

ai,a2,0'3  k=l,...,K 

where  ai,  0:2,  and  0:3  are  all  nonnegative  and  sum  to  one.  This  can  be  reformulated 
by  introducing  as  a  new  variable  the  max  min  SNR  goal,  0:4: 

Maximize  a 4 


given  the  constraints 

3 

ani,k  -  Q4  >  0,  k  =  l,...,K 

i=l 

ai  +  0:2  +  0:3  <  1 

ai  >  0,  i  =  (5.18) 

With  at  most  a  couple  sign  changes  of  coefficients  to  get  all  the  inequalities  in  the 
same  direction,  this  hts  the  form  of  the  Matlab  command  1  inprog  and  other  simplex 
method  implementations. 


Two-Receiver  Illustration 


For  the  specihc  case  of  two  receivers,  an  interesting  result  illustrates  the  relationship 
between  sending  common  information  and  distinct  information  to  two  receivers.  With 
distinct  information,  the  (often  suboptimal,  but  tractable)  zero-forcing  beamforming 
leads  to  SNRs  of 


SNRi 

SNR2 


—  a 


) 


A/q 


(1  -  Q;)||h,2 


(1  -  a) 


Il^ill^  J 


a/q 


where  a  is  the  fraction  of  power  sent  to  the  hrst  receiver.  Using  power  control  to 
equalize  the  SNRs  leads  to 

Wh^fWh^w^  -  \\h\h2f  V 

II^i||2  +  II^2||2  '  Afo' 


SNRi  =  SNR2  = 
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(5.19) 


On  the  other  hand,  with  space-time  multicast  coding  using  the  two  single-user  beam¬ 
forming  directions, 


SNRi 

SNR2 


+  (1 


a) 


\hlh2 


|h,9 


(1  -  a)||^2| 


A/o 

A/o 


where  a  is  the  fraction  of  power  sent  along  the  hrst  receiver’s  direction.  (This  is 
also  suboptimal,  because  we  showed  that  for  two  receivers,  single-rank  beamforming 
is  best.)  We  wish  to  optimize  a  to  maximize  the  minimum  SNR.  Since  increasing 
a  always  improves  SNRi  at  the  expense  of  SNR2,  the  best  scenario  is  when  we  can 
make  SNRi  =  SNR2.  If  this  is  possible  (that  is,  if  the  solution  to  SNRi  =  SNR2  leads 
to  an  0  <  q;  <  1),  then 


SNRi  =  SNR2 


Wh^-^Wh^f  +  Whlh^f  V 
iihir  +  iih^ip  'a/-„' 


(5.20) 


Comparing  (5.19)  and  (5.20),  the  only  difference  is  in  the  sign  of  the  cross  term, 
which  is  essentially  the  deterministic  correlation  between  the  two  realized  channel 
vectors.  When  multiplexing  separate  data,  correlation  between  channels  is  bad,  be¬ 
cause  it  causes  interference  that  either  degrades  performance  or  is  to  be  avoided.  For 
multicast,  correlation  improves  performance,  avoiding  the  need  to  send  redundant 
information. 


5.4.4  Performance  of  Higher-Rank  Covariance  Matrices 

Although  the  value  of  channel  information  decreases  as  the  number  of  receivers  gets 
large,  our  space-time  multicast  codes  still  exhibit  a  signihcant  performance  advantage 
for  moderate-sized  systems.  We  illustrate  this  for  an  example  where  an  8-element 
array  multicasts  a  single  stream  to  8  receivers. 

To  communicate  with  all  receivers  reliably,  we  concentrate  on  maximizing  the 
minimum  of  equivalent  SNRs  among  them.  Once  again,  this  optimization  tends  to 
be  very  difficult  in  general,  so  we  will  constrain  space-time  multicast  coding  to  using 
a  weighted  set  of  single-user  beamforming  directions,  as  discussed  in  Section  5.4.3. 
Optimal  weights  between  the  vectors  were  computed  as  in  (5.18).  Without  channel 
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Weakest  Receiver  SNR  (dB) 


Figure  5-6:  Weakest-receiver  effective  SNR,  from  simulations,  when  multicasting  a 
stream  from  an  8-element  array  to  8  receivers,  at  an  input  SNR  per  link  of  5  dB. 
The  schemes  shown  are:  “No  Array”:  single  transmit  antenna  element;  “No  Tx 
Knowledge”:  space-time  coding  with  orthogonal  matrix  G,  “ST  Multicast”:  using 
channel  knowledge  with  the  weights  chosen  by  the  method  of  (5.18)  . 


knowledge,  ordinary  space-time  codes  would  ideally  choose  a  G  matrix  with  orthog¬ 
onal  columns. 

We  compare  the  outage  performance  with  and  without  channel  knowledge  in 
Fig.  5-6.  As  expected,  there  is  a  large  gain  for  both  methods  over  not  using  an 
array.  Even  on  this  scale,  however,  channel-aware  transmission  noticeably  outpaces 
ordinary  space-time  codes.  The  outage  curves  have  similar  shapes,  but  are  separated 
by  about  6  dB  at  1%  outage.  To  place  this  into  context,  the  improvement  if  the 
transmitter  could  perfectly  direct  the  stream  to  all  receivers  simultaneously  would 
be  10  log2  8  ~  9  dB.  For  similar  simulations  with  a  4-element  array  and  4  receivers, 
about  4  dB  of  the  possible  6  dB  advantage  is  preserved.  Since  we  know  these  bounds 
are  unattainable,  the  fact  that  we  get  a  good  deal  of  the  way  there  speaks  to  the 
effectiveness  of  our  methods  and  the  usefulness  of  channel  knowledge  for  multicast. 
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5.5  Multicast  Within  Larger  Systems 


We  now  return  to  the  larger  picture  of  a  system  with  a  number  of  data  streams,  both 
multicast  and  receiver-specihc.  Such  systems  must  hud  ways  for  these  streams  to 
coexist  through  scheduling,  spatial  multiplexing,  or  both.  In  our  vision,  the  scheduler 
doles  out  streams  to  different  array  processing  subblocks  for  precoding,  beamforming, 
or  multicasting;  these  subblocks  in  turn  work  together  to  get  the  data  across  without 
undue  interference.  At  a  basic  level,  this  consists  of  incorporating  a  multicast  stream 
into  the  model  of  earlier  chapters,  as  a  “metastream”  that  has  a  number  of  intended 
receivers  in  its  multicast  group.  This  discussion  brings  together  many  of  the  tech¬ 
niques  developed  in  this  thesis  and  provides  an  overall  vision  for  how  such  a  system 
may  operate. 

5.5.1  Integration  Among  Array  Processing  Subblocks 

If  the  array  processing  task  is  to  transmit  more  than  one  stream  at  once,  it  must  hnd 
a  way  not  only  to  direct  the  data  to  its  intended  recipients  but  also  not  to  cause  inter¬ 
ference  at  other  receivers.  For  individual-receiver  streams,  the  transmitter’s  channel 
information  enabled  us  to  use  spatial  precoding  and  beamforming  to  accomplish  this. 
We  will  hnd  that  similar  techniques  can  reduce  interference  among  multicast  streams 
or  between  a  multicast  stream  and  several  individual-receiver  streams. 

Beamforming-based  separation  works  in  much  the  same  way  as  before.  Any 
individual-receiver  streams  must  set  their  beamforming  directions  to  be  orthogonal 
to  all  other  active  receivers’  channel  vectors,  including  those  in  multicast  groups.  A 
multicast  stream  similarly  needs  to  transmit  orthogonally  to  all  receivers  not  in  the 
group.  To  hnd  the  proper  space-time  multicast  coding  parameters,  the  multicast 
subblock  should  hrst  project  each  of  the  channel  vectors  in  the  group  away  from  all 
receivers  not  in  the  group,  and  then  optimize  the  group’s  transmission  scheme  based 
on  these  new  channel  vectors.  Power  can  be  redistributed  among  the  diherent  streams 
and  metastreams  as  needed.  Note  that  because  space-time  codes  that  do  not  make 
use  of  channel  knowledge  are  designed  to  spread  their  signal  throughout  the  entire 
space  of  possible  directions,  they  are  not  appropriate  for  spatial  multiplexing  with 
other  streams  in  this  way.  This  serves  as  an  additional  advantage  of  our  space-time 
multicast  codes. 

Precoding,  which  we  saw  achieve  signihcant  improvements  over  zero-forcing  beam- 
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forming  in  many  cases,  can  be  adapted  in  a  more  limited  way.  Recall  that  precoding 
works  on  an  ordered  set  of  streams,  where  later  streams  precompensate  for  interfer¬ 
ence  from  the  earlier  ones.  Because  a  multicast  stream  must  send  the  same  message 
to  a  number  of  receivers,  yet  each  will  each  receive  a  different  linear  combination  of 
interference  from  earlier  streams,  it  is  difficult  to  set  up  a  precoding  procedure  for  a 
multicast  group  without  suffering  a  rate  loss.  On  the  other  hand,  it  is  possible  for 
later-ordered  streams  to  use  precoding  to  precompensate  for  interference  from  one  or 
more  multicast  streams.  This  ordering  also  has  the  advantage  that  a  multicast  group 
can  compute  its  transmission  scheme  and  performance  before  dealing  with  the  other 
receivers. 


Putting  this  all  together,  we  can  group  the  active  receivers  into  a  number  of 
groups  based  on  the  array  processing  of  their  associated  data  streams.  Individual- 
receiver  streams  may  go  to  the  precoding  subblock  or  may  instead  perform  zero-forcing 
beamforming,  for  instance  if  their  receivers  do  not  support  modulo-extended  sheers. 
Each  multicast  stream  has  its  own  group  of  receivers.  We  can  then  partition  the  array 
processing  into  a  global  preprocessing  step  and  more  local  signaling  done  within  each 
group.  First,  order  the  total  set  of  receivers  such  that  the  multicast  groups  are 
first,  then  beamforming,  and  hnally  precoding.  Then,  the  preprocessing  step  could 
ensure  that  groups  constrain  their  transmission  to  be  orthogonal  to  channel  vectors 
in  other  multicast  or  beamforming  groups.  The  multicast  and  beamforming  groups 
need  not  worry  about  causing  interference  for  receivers  in  the  precoding  group,  since 
any  crossover  interference  will  be  removed  by  precoding.  At  this  point,  processing 
within  each  group  can  proceed  as  normal.  What  results  is  an  effective  channel  matrix 
(before  precoding  itself)  that  is  a  mix  between  block  diagonal  (for  the  multicast  and 
beamforming  groups)  and  lower  triangular  (for  the  precoding  group).  For  example,  if 
there  is  a  precoding  group  with  four  receivers,  a  beamforming  group  with  two  single- 
user  streams,  and  a  precoding  group  of  two  streams,  then  the  possible  non-zero  entries 
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of  this  matrix  are  highlighted  as 


X  X  X  X 
X  X  X  X 
X  X  X  X 
X  X  X  X 

X 

X 

X  X  X  X  X  X  X 

xxxxxxxx 

This  procedure  allows  spatial  multiplexing  between  different  types  of  groups  while 
not  signihcantly  changing  the  operation  within  each  array  processing  subblock. 

5.5.2  Integration  at  the  Scheduling  Layer 

If  there  are  more  than  a  few  potential  streams,  the  system  will  likely  also  require  the 
integrated  scheduling  of  multicast  and  individual-receiver  data.  The  simplest  trans¬ 
mitters  would  timeshare  between  different  types  of  data  streams  in  a  round-robin 
manner.  However,  this  ignores  the  possibility  of  capturing  some  of  the  multiplexing 
gains  we  saw  in  earlier  chapters.  Recall  that  the  potential  throughput  of  systems 
increased  severalfold  as  the  number  of  receivers  approached  the  number  of  transmit 
antenna  elements.  A  more  sophisticated  system  would  attempt  to  use  channel  infor¬ 
mation  to  select  appropriate  groups  of  streams  for  spatial  multiplexing.  The  scheduler 
then  feeds  the  selected  active  streams  into  their  respective  array  processing  snbblocks, 
where  the  types  of  mnltiplexing  described  above  can  occur. 

The  scheduling  algorithm  will  depend  on  the  delay  tolerance  of  the  data,  how 
many  intended  receivers  there  are  for  each  multicast  stream,  and  whether  precod¬ 
ing  is  used  for  individual-receiver  streams.  Because  both  precoding  and  zero-forcing 
beamforming  reqnire  restricting  the  transmission  of  at  least  some  streams,  a  good  rule 
of  thumb  for  an  M-element  array  may  be  to  have  no  more  than  M  active  receivers 
at  any  one  time,  unless  all  receivers  belong  to  the  same  mnlticast  stream.  Beyond 
this,  the  most  important  issues  will  once  again  be  reducing  the  potential  interference 
between  streams  and,  if  the  delay  tolerance  allows,  selecting  receivers  whose  channels 
are  of  high  instantaneous  quality.  Multicast  streams  make  dealing  with  both  of  these 
issnes  more  difficult  because  the  scheduler  must  satisfy  all  receivers  of  a  particular 
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stream  simultaneously. 

One  complexity-reducing  solution  would  employ  separate  queues  for  multicast 
and  individual-receiver  streams  and  communicate  no  more  than  one  multicast  group 
at  any  single  time.  Once  the  active  multicast  group  is  selected,  the  scheduler  can 
treat  these  receivers  as  if  they  were  getting  separate  streams  as  it  selects  additional 
streams  according  to  the  algorithms  of  Chapter  4.  For  example,  with  moderate  delay 
constraints,  this  selection  would  be  to  hnd  a  set  of  channel  vectors  that  are  nearly 
orthogonal.  For  a  transmitter  with  an  8-element  array,  a  typical  timeslot  may  include 
a  multicast  stream  with  four  receivers  and  three  or  four  additional  individual-receiver 
streams.  Unless  the  multicast  groups  consist  of  a  very  small  number  of  receivers, 
such  a  system  is  not  likely  to  lose  much  in  performance  compared  with  a  fully  inte¬ 
grated  scheduler.  Among  the  remaining  challenges  would  include  building  in  fairness 
constraints  to  strike  the  right  balance  between  the  different  types  of  streams. 
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Chapter  6 

Conclusions  and  Future  Work 


In  this  thesis,  we  have  discussed  the  design  of  various  system  components  for  a  trans¬ 
mitter  antenna  array  as  well  as  a  higher-level  view  of  how  these  components  interact. 
We  found  that  a  consideration  of  the  channel  parameters  and  input  data  stream  prop¬ 
erties  can  be  very  useful  at  both  the  scheduling  and  array  processing  levels.  Although 
this  may  violate  some  of  the  principles  of  the  traditional  layered  approach,  the  gains 
achieved  by  channel-aware  scheduling  or  sophisticated  spatial  multiplexing  imply  that 
a  rethinking  may  be  in  order. 

In  an  effort  to  make  our  results  applicable,  we  have  centered  the  development 
around  implementations,  design  choices,  and  analyzing  the  key  issues  involved  with 
particular  system  tasks.  Some  of  the  major  contributions  include: 

•  An  overall  framework  for  the  integrated  design  of  transmitter  antenna  array 
systems.  Of  particular  importance  is  the  partitioning  into  scheduling  and  ar¬ 
ray  processing  tasks,  as  outlined  in  the  introduction.  We  found  this  led  to 
convenient  problem  formulations  yet  allowed  for  sufficient  interaction  among 
components  to  approach  the  potential  of  the  array.  It  also  helped  make  clear 
the  different  options  for  placing  complexity  throughout  the  system  and  the  as¬ 
sociated  performance  of  these  choices. 

•  At  the  array  processing  level,  we  added  new  insights  and  extensions  to  spatial 
precoding.  Building  upon  a  series  of  recent  results,  our  work  concentrated  on 
variations  of  the  basic  precoding  procedure  to  satisfy  system  goals  for  different 
data  classes,  channel  modeling  assumptions,  and  modulation  techniques.  This 
included  changing  the  ordering  of  streams  for  different  types  of  data,  adapt¬ 
ing  symbol  constellations  based  on  the  interference  distribution,  and  extending 
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precoding  to  multiuser  intersymbol  interference  channels. 


•  We  demonstrated  how  channel-aware  scheduling  techniques  have  the  potential 
to  increase  performance  for  a  number  of  data  types.  In  particular,  we  used  the 
delay  tolerance  of  the  various  streams  to  provide  the  scheduler  with  flexibility 
and  constraints  in  rearranging  the  ordering  and  grouping  of  streams.  Even 
for  a  small  amount  of  flexibility,  an  appropriate  grouping  can  help  the  array 
processing  achieve  much  better  reliability  and  higher  rates.  More  sophisticated 
scheduling  can  also  enable  lower  complexity  at  the  array  processing  level. 

•  For  the  multicasting  of  common  data  streams,  we  developed  optimal  signaling 
techniques  as  well  as  more  practical  implementations.  Among  these  were  two 
important  methods,  useful  in  different  regimes,  representing  beamforming  and 
an  adaptation  of  space-time  codes  to  accommodate  channel  knowledge.  In  this 
process,  we  helped  dehne  what  it  means  for  the  transmitter  to  operate  efficiently 
in  terms  of  balancing  performance  to  the  multiple  recipients. 

In  this  way,  we  have  considered  many  problems  in  detail,  yet  within  an  overall 
structure  in  which  individual  algorithms  may  be  included  or  replaced  depending  upon 
the  needs  of  an  individual  system. 

Future  work  can  continue  development  within  this  structure.  In  addition  to  nu¬ 
merous  possible  algorithmic  improvements,  this  may  take  the  form  of  expanding  into 
additional  components  at  either  end  of  the  signal  chain. 

On  the  physical  channel  side,  systems  may  be  developed  to  more  tightly  incor¬ 
porate  the  mechanisms  for  attaining  channel  information.  This  channel  estimation 
takes  up  system  resources  not  accounted  for  in  our  discussion.  Information  about 
the  current  data  streams  and  previous  channel  states  could  potentially  be  used  to 
request  when  and  how  much  channel  information  is  needed.  Another  important  goal 
would  be  to  further  characterize  the  effect  that  partial,  rather  than  perfect,  channel 
information  has  on  the  main  components.  Yet  another  direction  is  to  include  more 
detail  in  the  channel  model,  such  as  the  movement  of  mobile  receivers  relative  to  the 
transmitter,  and  then  adapt  scheduling  algorithms  to  these  models. 

At  the  other  side  would  be  a  further  awareness  of  the  data  streams  and  their 
performance  goals.  We  have  attempted  to  maximize  rate  or  reliability-related  goals 
while  respecting  certain  coarse  delay  constraints.  The  next  step  may  be  a  more  specihc 
investigation  into  the  fundamental  delay /throughput  tradeoffs  of  spatial  multiplexing 
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systems.  A  different,  but  related,  direction  involves  the  development  of  scheduling 
algorithms  to  achieve  more  formal  quality-of-service  measures  such  as  packet  drop 
rates  and  delay  guarantees.  As  explained  in  Chapter  4,  we  anticipate  that  although 
such  goals  are  not  necessarily  a  good  match  to  array  fading  channels,  in  practice  it 
may  be  possible  to  meet  them  due  to  the  robustness  of  scheduling  over  a  constrained 
set  of  available  channel  vectors. 

Issues  of  a  more  global  nature  appear  when  a  wireless  network  contains  multiple 
array  transmitters.  For  example,  signals  from  one  transmitter  will  cause  interference 
on  the  communication  from  others.  Cellular  systems  often  mitigate  this  interference 
by  partitioning  receivers  and  bandwidth  resources  among  separate  cells.  A  more  ef¬ 
fective  approach  would  use  greater  coordination  across  transmitters.  At  a  conceptual 
level,  the  different  antenna  elements  from  all  of  the  transmitters  may  be  considered 
as  one  larger  virtual  array,  upon  which  many  of  the  techniques  discussed  in  this  thesis 
may  be  applied.  However,  as  networks  extend  from  several  cells  to  entire  metropoli¬ 
tan  areas  and  more,  a  comprehensive  implementation  quickly  becomes  unmanageable. 
It  is  also  unnecessary,  because  interference  from  a  single  transmitter  will  be  neglible 
except  in  a  small  geographic  area;  mathematically,  the  channel  matrix  from  the  whole 
virtual  array  to  all  of  the  receivers  will  be  very  sparse.  The  main  network-level  prob¬ 
lem  is  therefore  to  hud  some  reasonable  compromise  between  partitioning  and  full 
coordination. 
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Appendix  A 


Ordering  of  Two  Streams  to 
Maximize  Sum  Capacity 


We  wish  to  show  that  in  a  two-receiver  scenario  with  precoding,  the  sum  capacity  is 
maximized  by  choosing  the  first  receiver  to  be  the  one  with  the  best  channel,  i.e.,  the 
largest  ||h.fc|p. 

Recall  that  before  power  control,  the  receiver  that  is  ordered  hrst  gets  its  full 
single-user  SNR  and  that  the  product  of  SNRs  to  the  two  receivers  is  independent 
of  the  ordering.  Therefore,  we  can  show  the  above  by  proving  that,  given  a  constant 
product  of  SNRs,  the  sum  capacity  is  monotonically  increasing  with  the  maximum 
value  of  the  two  SNRs. 

Suppose  without  loss  of  generality  that  with  a  particular  ordering,  the  SNRs  before 
power  control  are  Pi  and  P2,  where  Pi  >  P2-  Power  control  gives  a  fraction  a  of  power 
to  receiver  1.  The  sum  capacity,  given  perfect  information  embedding,  is  then 

C  =  log2(l  +  aPi)  +  log2(l  +  (1  —  0')P2)- 


By  taking  the  derivative,  we  see  that  this  is  maximized  with  the  waterhlling  solution 


a  =  max 


A  Pi  —  P2+  P1P2  \ 

V  ’  2A/32  J  ■ 


If  a  =  1,  that  is,  if  waterhlling  gives  all  the  power  to  one  receiver,  then  capacity 
is  maximized  by  choosing  Pi  as  large  as  possible.  Therefore,  this  case  is  proved. 

From  now  on,  then,  assume  that  nonzero  power  is  sent  to  both  receivers.  The  sum 
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capacity  is 


C  =  log2  1  + 


fil  —  1^2+  /5i/32 
2/3i/32 


A  )  +  log2  (  1  + 


=  logs  1  + 


1^2  —  I3l  +  /5i/32 
2/3i/32 

2q2 


^2 


Pi  P2  ~  2/3i/32  +  2/5^ /^s  +  ‘2P1P2  +  P1P2 
^PlP2 


(A.l) 


Next,  let  c  be  the  product  of  SNRs,  c  =  PiP2-  The  sum  capacity  in  (A.l)  becomes 


C  =  log2  1  + 


Pl  + 


2c  +  2/3ic  +  ^ — h  c 


4c 


We  wish  to  show  that  for  a  constant  c,  this  is  monotonically  increasing  in  (3i.  Since 
terms  that  are  only  functions  of  c  will  not  affect  this  property,  this  is  equivalent  to 
showing  that 

ci=^*f  +  ^  +  2Ac+^ 
is  monotonic  in  Pi.  Taking  derivatives. 


dci 

Wi 

d^Ci 


^  2c2  2c2 

2/3, -^  +  2c-^ 

6c^  4c^ 


The  first  derivative  is  zero  at  Pi  =  P2  =  \/c  and  the  second  derivative  is  always 
nonnegative.  Since  we  assumed  that  Pi  >  P2,  this  implies  that  in  the  region  of 
interest,  Ci  is  monotonically  increasing  in  Pi,  and  consequently,  the  sum  capacity  in 
monotonically  increasing  in  Pi.  This  proves  the  case  when  both  streams  are  sent  with 
nonzero  power. 
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Appendix  B 


2-Bit  Signaling  in  Larger-Order 
QAM  Interference 


Consider  a  4-QAM  embedding  (that  is,  two  bits  of  information)  by  receiver  k  with 
power  constraint  Vk-  We  look  at  the  case  when  the  interfering  signal  behaves  like  a 
higher-order  QAM  constellation.  For  simplicity,  assnme  that  this  interference  con¬ 
stellation  has  inhnite  extent,  and  has  eqnal-probability  points  spaced  2Q  apart  in 
both  the  real  and  imaginary  directions. 

If  the  interference  is  large  enongh,  we  can  snrronnd  each  constellation  point  with 
an  embedding  constellation,  as  in  Fig.  3-llc,  and  snffer  no  precoding  power  loss.  As 
we  have  seen,  thongh,  this  only  works  if  the  spacing  between  interference  points  is 
large  enongh.  When  this  is  not  trne,  we  can  match  a  larger  nnmber  of  interference 
points  with  each  qnartet  of  embedding  points.  Fig.  B-1  demonstrates  an  embedding 
where  each  4-QAM  set  snrronnds  fonr  interference  points.  An  interference  point  will 
get  qnantized  to  one  member  of  the  snrronnding  embedding  qnartet,  selected  by  the 
inpnt  bit  pair.  If  there  are  interference  points  for  each  qnartet,  then  the  embedding 
tiling  will  will  not  overlap  as  long  as 


A  > 


0 


(B.l) 


where  2(  is  the  spacing  between  embedding  constellation  points  (of  different  types) 
and  [■]  is  the  ceiling  operator. 

For  this  type  of  embedding,  the  average  transmitted  power  is  the  snm  of  the 
powers  of  a  qnartet  of  embedding  points  and  of  the  set  of  interference  points  with 
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(a)  One  embedding  quartet  (b)  Full  embedding 

Figure  B-1:  Sample  embedding  of  4-QAM  inside  large-order  QAM  interference.  In 
this  example,  each  “embedding  costellation”  surrounds  four  possible  interference 
points. 

with  it  is  matched,  if  we  assume  that  both  groups  of  points  are  centered  at  the  origin. 
With  a  set  of  interference  points. 

The  second  term  in  the  formula  represents  the  precoding  power  loss,  so  it  is  clear 
that  we  want  to  surround  as  few  interference  points  as  possible,  working  at  the  lower 
bound  of  (B.l).  The  precoding  power  loss  is  shown  in  Fig.  B-2.  As  the  set  of  inter¬ 
ference  points  becomes  more  dense,  its  discrete  distribution  gets  closer  to  a  uniform 
distribution,  so  it  is  not  surprising  that  precoding  power  loss  approaches  that  of 
uniformly-distribributed  interference. 

A  slightly  different  perspective,  perhaps  more  in  line  with  system  goals,  would  be 
to  maximize  the  distance  2(  given  a  power  constraint  Pk-  To  satisfy  both  the  power 
constraint  and  (B.l),  we  may  at  times  have  to  transmit  with  a  power  lower  than 
Note  that  the  tiling  in  Fig.  B-lb  once  again  looks  like  a  uniform  quantizer  but 
where  the  embedding  points  were  pulled  back  to  the  center  of  each  set,  just  as  in  dis¬ 
tortion  compensation.  However,  in  true  distortion  compensation,  the  reconstruction 
points  would  be  pulled  back  in  the  direction  of  the  interference  points  themselves, 
not  to  the  center  of  each  set  of  points. 
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